0% found this document useful (0 votes)
6 views26 pages

Geometric Deep Learning For Computer-Aided Design: A Survey

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 26

1

Geometric Deep Learning for Computer-Aided


Design: A Survey
Negar Heidari and Alexandros Iosifidis
Department of Electrical and Computer Engineering, Aarhus University, Denmark
Emails: {negar.heidari, ai}@ece.au.dk

Abstract—Geometric Deep Learning techniques have become representations directly from B-Rep data poses challenges.
a transformative force in the field of Computer-Aided Design Converting B-Rep data to conventional formats like triangle
(CAD), and have the potential to revolutionize how designers meshes is not only computationally expensive but also leads
and engineers approach and enhance the design process. By
harnessing the power of machine learning-based methods, CAD to information loss [9]. Hence, learning feature representations
designers can optimize their workflows, save time and effort directly from B-Rep data becomes imperative, ensuring an
efficient capture of the most representative geometric features
arXiv:2402.17695v1 [cs.CG] 27 Feb 2024

while making better informed decisions, and create designs that


are both innovative and practical. The ability to process the in CAD models for subsequent analyses without the burden
CAD designs represented by geometric data and to analyze of extensive computation and memory usage.
their encoded features enables the identification of similarities
among diverse CAD models, the proposition of alternative designs Recently, there has been a great research interest in lever-
and enhancements, and even the generation of novel design aging GDL methods for learning the structure of CAD models
alternatives. This survey offers a comprehensive overview of and for facilitating the design process in different aspects.
learning-based methods in computer-aided design across various While conventional machine learning and deep learning ap-
categories, including similarity analysis and retrieval, 2D and 3D proaches have been explored for CAD classification and clus-
CAD model synthesis, and CAD generation from point clouds.
Additionally, it provides a complete list of benchmark datasets tering tasks [10]–[12], these methods predominantly focus on
and their characteristics, along with open-source codes that have learning feature representations based on physical features like
propelled research in this domain. The final discussion delves volume, bounding box, area, density, mass, principal axes, or
into the challenges prevalent in this field, followed by potential directly work on other data formats for 3D data, such as point
future research directions in this rapidly evolving field. cloud, without considering B-Rep. Accordingly, they often fall
Index Terms—geometric deep learning, machine learning, short in capturing the concise geometric properties embedded
graph neural networks, computer aided design, automated CAD in CAD models.
design, 2D-3D shape modeling
As GDL methods continue to evolve, they are expected to
play a pivotal role in shaping the future of CAD design across
I. I NTRODUCTION industries. In recent years, more advanced GDL methods such
Deep Learning has revolutionized data processing across as GNNs and (Graph) Transformer Networks have shown
different domains spanning from grid-structured data in Eu- great potential in learning the complex geometric features
clidean spaces, such as image, video, text, to graph-structured embedded in CAD models, particularly from B-Rep data,
data with complex relationships. Geometric Deep Learning irrespective of their physical features [9], [10], [13]–[16].
(GDL) encompasses a wide array of neural network architec- These learned feature representations serve various purposes,
tures ranging from Convolutional Neural Networks (CNNs), such as reconstructing CAD models [15]–[17], determining
Graph Neural Networks (GNNs), Recurrent Neural Networks joints between CAD solids [18], [19], and autocompleting
(RNNs), to Transformer Networks, all of which encode a unfinished CAD models [20]–[23]. The overarching objective
geometric understanding of the data such as symmetry and of these learning-based approaches is to elevate the level of
invariance as an inductive bias in their learning process [1]. automation in the CAD design process. By alleviating the
Over the past decade, GDL methods, including CNNs, GNNs, need for experts to perform repetitive and time-consuming
RNNs, and Transformer Networks, have made remarkable tasks, such as sketching and drafting, these methods have the
strides in diverse tasks coming from different applications, potential to empower designers to focus on the more creative
including computer vision, natural language processing, com- facets of the design process. By analyzing historical design
puter graphics, and bioinformatics. However, the application of data, extracting geometric features, and discerning valuable
GDL methods on complex parametric Computer-Aided Design insights like similarities between CAD models, these methods
(CAD) data is rarely studied. Boundary Representation (B- can facilitate the reuse of CAD models in new products. This
Rep), which is a fundamental data format for CAD models not only saves time and resources by preventing redundant
encoding a high level of parametric details in CAD models, designs but also aids designers in customization by generating
can be used to learn the intricate geometric features of design alternatives based on specific parameters and objec-
such models. Although GDL methods have seen considerable tives. Furthermore, there is a growing demand for CAD tools
success in analyzing 3D shapes in mesh [2], [3], voxel [4]– that can reverse engineer models and generate diverse design
[6], and point cloud [7], [8] data formats, extracting feature options based on concise parameters derived from preliminary
2

concept sketches. GDL-based methodologies have made some 1) CAD Tools and Python APIs: To name some of the
progress in meeting this demand [22], [24]. However, the most popular software options, AutoCAD [36] and Fusion 360
introduction of machine learning-based methods in this area [37] (a cloud-based software) offer 2D/3D design tools for a
is not without challenges. A major challenge is the scarcity variety of industries. SolidWorks [38] and CATIA [39] pro-
of annotated CAD datasets in B-Rep format. Unlike other vide parametric modeling and simulation for the mechanical,
data formats such as images, videos, or text, collecting CAD aerospace, and automotive industries. OnShape [40] (a cloud-
models is intricate and time-consuming, typically requiring the based software) and Creo [41] offer 3D parametric design
expertise of skilled engineers. Moreover, these datasets often and simulation. TinkerCAD [42] is a free-of-charge and web-
remain proprietary and inaccessible to the public. On the other based CAD design tool which is mostly used for beginners
hand, annotating CAD datasets, especially for mechanical and educational purposes.
objects, demands substantial domain knowledge. Given that For CAD parsing and development, several Python APIs
training deep learning methods without large-scale annotated are available, providing tools and libraries to work with CAD
datasets is impractical, recent research efforts have also made data, create 3D models, visualize CAD models, and to perform
valuable contributions by providing benchmark CAD datasets. various related operations. To name some of the most popular
As this field is still emerging, there is plenty of room for Python APIs for CAD, PythonOCC [43] offers an open-source
future research and development, and several challenges need 3D modeling and CAD library for Python. It is based on
to be explored and addressed. Our survey, to the best of OpenCASCADE Technology (OCCT) [44] which is a power-
our knowledge, stands as the first comprehensive review in ful geometry kernel and modeling framework used for CAD.
this rapidly evolving domain, which endeavors to provide a PythonOCC provides Python bindings to the OpenCASCADE
thorough exploration of recent advancements, challenges, and C++ library, allowing developers to use Python for creating,
contributions in this area. Figure 1 shows the structure of the manipulating, visualizing and analyzing 3D geometry and
survey and highlights the most representative methods in the CAD models. OCCWL [45] has been recently released as
field reviewed in more detail. As we embark on this survey, a simple, lightweight Pythonic wrapper around PythonOCC.
our goal is to serve as a guiding resource for researchers CADQuery [46], a Python library for scripting 3D CAD
and practitioners eager to navigate the dynamic intersection models, is also built on top of the OpenCASCADE geometry
of GDL and CAD. The survey is tailored to provide: kernel.
• A comprehensive review of the state-of-the-art GDL Another powerful 3D geometric modeling kernel is Para-
methods employed in the analysis of CAD data. This solid [47], which provides a set of functions and methods
encompasses diverse categories, including similarity anal- for creating, manipulating, and visualizing and analyzing 3D
ysis, classification, and retrieval, as well as segmentation, solid models. However, unlike OpenCASCADE, Parasolid is
along with the synthesis of 2D and 3D CAD models. Fur- not open-source and software developers need to license it to
thermore, the survey explores techniques for generating integrate its capabilities into CAD applications.
CAD models from alternative data representations, such 2) Terminology in CAD Design: Boundary Representation
as point clouds. (B-Rep), serves as a fundamental modeling technique utilized
• A detailed overview of benchmark CAD datasets crucial in CAD to represent intricate 3D models along with their
for the advancement of research in this field, accompanied geometrical attributes. It facilitates accurate and consistent
by open-source codes that have been instrumental in design, modification, manipulation, analysis and representation
pushing the boundaries of achievable outcomes. of 3D entities by characterizing surfaces, curves, points, and
• An in-depth discussion about the challenges that persist topological relationships between them in 3D space. In Table
in this domain followed by potential future research I, we introduce the basic terminology of CAD design which is
directions by assessing the limitations of existing method- essential for understanding the concepts and methodologies in
ologies. this field. However, depending on the industries and different
design software applications that are used for design, there
II. BACKGROUND might exist more specialized terms and concepts that are
not covered here. For the sake of maintaining consistency
A. Computer Aided Design (CAD) throughout this paper, we will employ this terminology for
Computer-Aided Design (CAD) is a manufacturing tech- all methodologies and datasets discussed in the subsequent
nology that has revolutionized the way engineers, architects, sections.
designers, and other professionals create and visualize designs
[35]. CAD process involves the use of specialized software
to design, modify, analyze, and optimize 2D drawings and B. Geometric Deep Learning (GDL)
3D models of physical objects or systems digitally before Geometric Deep Learning (GDL) has arisen as a special
constructing them. Engineering drawing entails the use of and fundamental branch of artificial intelligence (AI) that
graphical symbols such as points, lines, curves, planes and expands deep learning approaches from regular Euclidean data
shapes, and it essentially gives detailed description about any to complex geometric structured data [1]. While traditional
component in a graphical form. There are several CAD soft- deep learning methods have led to great advancements in
ware options which are widely used in industry for different different applications by processing regular Euclidean data
purposes. structures, such as audio, text, images and videos, formed by
3

CAD analysis with GDL

Representation Learning on CAD

CAD Classification and Retrieval [9], [13]

CAD Segmentation [9], [14], [25]

CAD Assembly [18], [19]

CAD Construction with Generative Deep Learning

Engineering 2D Sketch Generation [20], [21], [23], [26], [27]

3D CAD Generation

3D CAD Generation from Sketch [22], [24], [28]

3D CAD Command Generation [17], [29]–[31]

3D CAD Generation with Direct B-rep synthesis [15], [16]

3D CAD Generation from Point Cloud [32]–[34]

Fig. 1: Taxonomy and the structure of the most representative methods reviewed in this survey.

regular 1D, 2D and 3D grids, GDL is tailored for processing geometric features through the graph, capturing subtle aspects
and extracting intricate spatial and topological features from of shape and curvatures and leveraging data-driven insights
irregular structured data like 3D shapes, meshes, point clouds, from existing B-Rep models, facilitating tasks like solid and/or
and graphs. These irregular data structures processed by GDL sketch segmentation [9], [14], [56], shape reconstruction and
can be either Euclidean or non-Euclidean, depending on the generation [16], [20], shape analysis and retrieval [9], [13].
context and the geometric properties they possess. GDL has By representing B-Rep as a graph and leveraging GNN
gained significant research attention in the CAD domain for its capabilities to capture its topological structure, connectivity,
ability in learning complex geometric features and facilitating and geometry, designers and engineers can analyze, optimize,
design process for engineers and designers. and create intricate 3D models with enhanced precision and
Graph neural networks (GNNs) [48], [49] are one of the efficiency.
most popular types of GDL approaches which excel in pro- Let us present a more detailed explanation of how B-
cessing graph-structured data. GNNs have made remarkable Rep models are represented as graphs. In the context of
strides in diverse tasks tasks coming from different applica- B-Rep, graph nodes can represent points, lines, and faces
tions, such as computer vision [50]–[52], bio-chemical pattern with associated attributes like geometric coordinates, line
recognition [53], [54], and financial data analysis [55]. GNNs lengths, curvature, face area and surface normals. Edges in the
offer a specialized approach for modeling complex CAD graph represent connections between nodes. For B-Rep, these
geometric structures encoded in B-Rep, as a graph which connections indicate constraints, like adjacency, tangency, or
represents the topology and geometry of 3D shapes through coincidence, between primitives like points, lines, faces, etc.
nodes and edges. GNN models can effectively capture the GNN methods embed each node’s features, creating node
topological structure, connectivity, and attributes inherent in representations that capture geometric attributes that aggregate
graphs (B-Reps) and allow hierarchical analysis by considering information from neighboring nodes. In the B-Rep context,
local and global contexts, aligning with the complex structure this simulates the way geometric properties flow through
of B-Rep models. GNNs are also able to extract and propagate adjacent points, lines, curves, and faces. Through feature
4

Term Description
Model 2D or 3D representation of a real-world object or system created within a CAD software.
B-Rep Boundary Representation, a data format for geometrically describing objects by representing their topological components such
as surfaces, edges, vertices, and the relationship between them.
Sketch 2D drawing of an object, served as the basis for creating 3D models, made of lines, curves, and other basic geometric shapes.
Extrude A CAD operation for expanding a 2D drawing along a specific dimension to create a 3D model.
Primitive A basic 2D/3D geometric shape which serves as backbone for more complex designs. For 2D sketches, primitives are points,
lines, arcs, circles, ellipses and polygons. For 3D shapes primitives are cubes, spheres, cylinders, cones.
Constraint Geometric relationship, such as coincident, tangent, perpendicular, between primitives in a 2D/3D design.
Point Basic geometric entity representing a specific location in space with 3D coordinates X, Y, Z.
Line A straight path between two points in a 3D space.
Circle A closed curve defined by its center and radius.
Arc A curved line, as a part of a circle or ellipse, defined by a center, start and end points.
Loop A closed sequence of curves forming the boundary of a surface.
Profile A closed loop made by a collection of curves joint together.
Face A 2D surface bounded by a profile.
Edge A piece of a curve bounded by two points, defining the boundaries of a face.
Shell A collection of connected faces with joint boundaries and points.
Body/Part A 3D shape created by extruding a face.
Solid A 3D shape that is used as a building block to design an object.
Component Building blocks of assemblies containing one or more bodies (parts).
Assembly A collection of connected bodies (or parts) to create a larger object.
Joint The connection, defined by coordinate and axis, between bodies (parts) to make an assembly.
Topology The structure of points, curves, faces and other geometric components in a 3D model.
Wireframe Visual representation of 3D B-Rep models, showing the structure of the object with only lines and curves. Also known as
skeletal representation of 3D objects.
Rendering The process of visualizing CAD models to simulate their real-world appearance.
Surface Normal A unit vector perpendicular to a surface at a specific point.

TABLE I: Terminology in CAD design. A summary of the widely used terms through different research works and CAD
platforms.
Type Description Format
STEP Original parametric B-Rep format containing explicit description of the topology and geometry information of .txt, .step, .smt
the CAD models.
STL Standard Tessellation Language, a format describing 3D surfaces of an object with triangular facets (meshes). .stl
Obj A format describing 3D surface geometry using a triangular meshes. .obj
Statistics Statistical infromation of CAD models. .yml, .txt
Features A description of the properties of surfaces and curves with references to the corresponding vertices and faces .yml
of the discrete triangle mesh representation.
Image Visualization of objects produced by rendering process. .png
PCD Point Cloud Data, a collection of points in the 3D space representing the structure of a 3D model. .ASC, .PTX, .OBJ

TABLE II: A summary of different file formats and their corresponding description. Every CAD dataset provides at least one
of these data formats to be processed for different purposes.

aggregation, the method updates each node’s representation by V1


combining its own features with information of its neighbors.
V4
This step captures the influence of neighboring geometry.
The core operation in GNNs is the graph convolution, which
V5
is combined with a nonlinear function applied on the fused
V2
information from neighboring nodes. In a multi-layer GNN, V6

the graph convolution operation captures multi-hop neighbor-


ing features allowing for hierarchical analysis. For B-Rep,
V3
this operation helps to capture the hierarchical relationships
between primitives. To obtain a global representation of the
B-Rep model, GNNs can employ global pooling, summarizing Fig. 2: An example of a 3D solid represented as a graph, where
the entire graph information into a single feature vector. Figure the solid primitives such as curves and surfaces are represented
2, 3 show an example of a 3D shape and a 2D sketch as graph nodes, and their adjacency relationship between the
represented as graph, respectively. solid primitives are represented as graph edges.

The most popular GNN architectures, i.e., Graph Convolu-


tional Networks (GCNs) [57]–[59] and Graph AutoEncoders sections, we delve into details of the current state-of-the-art
(GAEs) [60], which are used for (semi-)supervised and unsu- methods in this area and the challenges.
pervised learning on graphs, respectively, for different graph
analysis tasks such as node/graph classification and graph III. DATASETS
reconstruction/generation. As GNN based methods continue Large data collections play a pivotal role in enhancing
to evolve, they are expected to play a pivotal role in shaping performance of deep learning models when applied in prob-
the future of CAD design across industries. In the following lems coming from different applications. Collecting such large
5

Tangent
segmentation in CAD shapes.
Circle Line Advancing across various tasks in GDL, and effective
Co Co
inc inc training of deep learning models, necessitate large parametric
ide ide
nt nt

Ta
CAD data collections. The existing parametric CAD datasets

ng
en
Coincident

t
Mirror
Line Arc with B-Rep data are limited in size and insufficient to meet
Circle
these demands. To address this shortfall, three big datasets,
t
den ABC [77], Fusion 360 Gallery [30], and SketchGraphs [20],
ci
in
Co
Line were introduced recently and provide valuable resources for re-
Circle
search in this area. ABC [77] is the first large-scale, real-world,
and hand designed dataset with over 1 million high-quality 3D
Fig. 3: An example of a simple 2D sketch represented as a CAD models covering a wide range of object types. Each CAD
graph, where the sketch primitives such as curves (circle, arc, model consists of a set of precisely parameterized curves and
line, etc.) are modeled as graph nodes, and the constraints surfaces, offering accurate reference points for sharp geometric
between these primitives are shown as graph edges. Sketch- feature representation, patch segmentation, analytic differential
Graphs [20] dataset contains such 2D sketches modeled as measurements, and the process of shape reconstruction. The
graphs. CAD models in ABC dataset are compiled and collected via
an openly accessible interface hosted by Onshape [40], and an
open-source geometry processing pipeline has been developed
datasets in regular formats such as image, video, audio, text, to process and prepare the CAD models in ABC dataset to be
and their distribution through different platforms like social used by deep learning methods.1
media has greatly accelerated the progress of deep learning in Training machine learning models to facilitate CAD con-
computer vision and natural language processing. GDL offers struction and synthesis can significantly benefit designers
advantages for tasks like 3D shape analysis, shape reconstruc- by minimizing the time and effort required for the design
tion, and building geometric feature descriptors. Nonetheless, process. An essential necessity for tasks related to CAD (re-
creating and annotating high-quality 3D geometric data needs )construction is to understand how a CAD model is designed
significant level of domain knowledge, and engineering and and how to interpret the provided construction information
design skills. Collecting such datasets is also challenging due in the dataset. The construction history information in ABC
to various factors such as concerns about proprietary rights and dataset can only be retrieved by querying the Onshape API in a
ownership, and lack of consistency and compatibility among proprietary format with limited documentation which makes it
data from available sources. challenging to develop CAD (re-)construction methods on that
Table III provides a summary of the existing datasets dataset. SketchGraphs dataset has been produced to fill this
along with their properties. The details about each dataset gap by providing a collection of 15 million human-designed
are provided in the dedicated sections for methodologies 2D sketches of real-world CAD models from Onshape API.
linked to each dataset. Table II introduces different CAD data 2D sketches representing geometric primitives (like lines and
formats along with their corresponding description. Existing arcs) and constraints between them (like coincidence and
commonly used 3D CAD datasets mostly provide mesh ge- tangency) can be seen as the basis of 3D shape parametric
ometry for 3D shape segmentation, classification and retrieval construction, and each 2D sketch is presented as a geo-
[5], [61]–[64], human body mesh registration [65], 2D/3D metric constraint graph where nodes denote 2D primitives
shape alignment [66] and 3D scene categorization, semantic and edges are geometric constraints between nodes (3). The
segmentation and object detection [67], [68]. Engineering SketchGraphs dataset can be used for various applications
shape datasets such as ESB [69], MCB [70], AAD [71], in automating design process such as auto-completing sketch
FeatureNet [72], and CADNet [73] also provide annotated drawing by predicting sequences of sketch construction op-
data with mesh geometry for mechanical shape classification erations or interactively suggesting next steps to designer,
and segmentation. The primary limitation of these datasets is and autoconstraint application where the method is predicting
their lack of parametric and topological features of curves and a set of constraints between geometric primitives of sketch.
surfaces commonly referred to as boundary representation (B- Other potential applications are CAD inference from images
Rep). These B-Rep features are essential for conducting para- where the method receives a noisy 2D drawing image and
metric CAD shape analysis. Recently, several geometric B- infers its design steps to produce the corresponding parametric
Rep datasets of different sizes and properties were introduced CAD model, and learning semantic representation encoded in
to boost GDL progress on CAD design. FabWave [74] is a sketches. An open-source Python pipeline for data processing
collection of 5, 373 3D shapes annotated with 52 mechanical and preparation for deep learning frameworks also comes
part classes, including gears and brackets, and Traceparts [75] along the dataset.2 Similar to SketchGraphs, freehand 2D
is a small collection of 600 CAD models produced by different sketch datasets such as [78]–[80] also have been introduced
companies labeled into 6 classes (100 CAD models in each to tackle this challenge by providing the sketch construction
class), including screws, nuts, and hinges, which can be used sequence.
for 3D shape classification. MFCAD [76] is a synthetic 3D
segmentation dataset containing 15, 488 shapes with annotated 1 https://deep-geometry.github.io/abc-dataset/

faces of 16 classes, which can be used for parametric face 2 https://github.com/PrincetonLIPS/SketchGraphs


6

Fusion 360 Gallery [30] has been introduced recently by machine learning research [75], the majority of large-scale
Autodesk as the first human designed 3D CAD dataset and public databases, such as ABC, remain predominantly unla-
environment with 3D operation sequences for programmatic beled. Additionally, not only the process of manually anno-
CAD construction. The fusion 360 Gallery dataset comes tating B-Rep data in a specialized format needs engineering
along an open source Python environment named as Fusion expertise, but it is also very time-consuming and costly, thus
360 Gym for processing and preparing the CAD operations posing a significant limitation. Consequently, deep learning
for machine learning methods. This dataset contains both 2D approaches that do not rely on external annotations, such as
and 3D geometric CAD data which are produced by the unsupervised and self-supervised learning, become particularly
users of the Autodesk Fusion 360 CAD software and are crucial alongside traditional supervised learning approaches
collected into the Autodesk Online Gallery. Several datasets in such scenarios. Learning feature representations based on
for different learning goals, such as CAD reconstruction [30], intrinsic data features, without external annotations or expert
CAD segmentation [14], and CAD assembly [18], are created knowledge, proves highly advantageous in overcoming the
based on a total of real-world 20, 000 designs in Fusion 360 scarcity of annotated CAD data. In this section, we introduce
Gallery. existing research endeavors that employ GDL to extract feature
representations from CAD B-Reps. These methods operate in
IV. CAD R EPRESENTATION L EARNING either a supervised, self-supervised, or unsupervised manner,
Studying extensive data repositories and uncovering hidden catering to various tasks such as classification, segmentation,
features in data for various tasks such as similarity analysis and similarity analysis and retrieval, and assembly prediction.
shape retrieval has been a vibrant field of research in machine
learning and artificial intelligence. The significance of this
A. CAD Classification and Retrieval
concept extends to CAD data as well. Machine learning-based
similarity analysis of CAD models can effectively facilitate Designing methodologies for categorizing 3D components
the design process for designers by categorizing designs, within B-Rep assemblies is significantly important for various
and retrieving similar CAD models as design alternatives. applications, such as re-using similar CAD components in
It has been shown that approximately 40% of new CAD different assemblies, shape recommendation and alternative
designs could be constructed based on existing designs in suggestion, and product complexity estimation, especially
the CAD repository, and at least 75% of design processes when 3D CAD models with substantially different geometries
leverage existing knowledge for designing a new CAD model are within the same category and share similar topology. One
[82]. Learning from CAD data and extracting features from of the first deep learning methods proposed to work directly on
extensive collections of geometric CAD shapes is important B-Rep data format of 3D CAD models is UV-Net [9]. UV-Net
for CAD retrieval and similarity analysis, and it involves em- proposed a cohesive graph representation for B-Rep data by
ploying various machine learning and deep learning methods modeling topology with an adjacency graph, and modeling
for extracting geometric attributes encoded in B-Rep data for geometry in a regular grid format based on the U and V
assessing similarity between pairs of CAD models. parameter domains of curves and surfaces. One of the main
The first step in this process is to select a subset of geomet- contributions of this work is to extract crucial geometric and
rical, topological, functional and other properties of the CAD topological features out of B-Rep data and make a grid data
model, based on specific analysis goals, to be represented in a structure out of complex B-Rep data to feed to deep learning
suitable format, such as numerical vectors, matrices, tensors, models for different tasks such as B-Rep classification and
or graphs, for the deep learning methods to process. Nev- retrieval. To generate grid-structured feature representations
ertheless, representing B-Rep data is challenging due to the from B-Rep data, this approach transforms each 3D surface
coexistence of continuous non-Euclidean geometric features into a regular 2D grid by performing surface sampling with
and discrete topological properties, making it difficult to fit fixed step sizes. In a similar fashion, it transforms each solid
into regular structured formats such as tensors or fixed-length curve into a 1D grid. The resulting 1D/2D grid mappings are
encodings. A key contribution of each state-of-the-art method referred to as UV-grids. Each sampled point in a surface’s
in this topic is the introduction of a method for encoding or 2D grid conveys three distinct values across 7 channels: a)
tokenizing B-Rep data into a format suitable for the adopted The 3D absolute point position, represented as xyz in the
deep learning architecture, tailored to the particular application UV coordinate system, b) The 3D absolute surface normal,
they are addressing. The deep learning methods receive these and c) The trimming mask with 1 and 0 denoting samples
representations as input and learn to classify, cluster, segment, in the visible region and trimmed region, respectively. For the
or reconstruct CAD models, considering the annotations at sampled points in a curve’s 1D grid, the encoding includes the
hand. However, the application of machine learning and deep absolute point UV coordinates and, optionally, the unit tangent
learning methods to CAD models is very rare because of the vector.
scarcity of annotated CAD data available in the B-Rep format. As illustrated in Figure 4, UV-Net model architecture is
In contrast to the geometric CAD models like ShapeNet [62], comprised of both CNN and GCN layers to first extract fea-
many parametric CAD datasets are not publicly released due tures from 1D and 2D grids representing curves and surfaces,
to the proprietary nature of design data. respectively, and then to capture the topological structure of
While there have been recent releases of small annotated the 3D shape encoded as a graph with hierarchical graph
datasets containing mechanical parts in B-Rep format for convolution layers. The 1D curve and 2D surface UV-grids
7

Dataset #Models B-Rep Mesh Sketch #Categories Application


ShapeNet [62] 3M+ × ✓ × 3,135 Classification
ModelNet [5] 12,311 × ✓ × 40 Classification
PartNet [64] 26,671 × ✓ × 24 Classification, Segmentation
PrincetonSB [61] 6,670 × ✓ × 92 Classification
AAD [71] 180 × ✓ × 9 Classification
ESB [69] 867 × ✓ × 45 Classification
Thingi10k [63] 10,000 × ✓ × 2,011 Classification
FeatureNet [72] 23,995 × ✓ × 24 Classification
MCB [70] 58,696 × ✓ × 68 Classification
CADNet [73] 3,317 × ✓ × 43 Classification
FABWave [74] 5,373 ✓ ✓ × 52 Classification
Traceparts [75] 600 ✓ ✓ × 6 Classification
SolidLetters [9] 96,000 ✓ × × 26 Classification
MFCAD [76] 15,488 ✓ × × 16 Segmentation
MFCAD++ [81] 59,655 ✓ × × 25 Segmentation
Fusion 360 Segmentation [14] 35,680 ✓ ✓ × 8 Segmentation
CC3D-Ops 37,000 ✓ ✓ × - Segmentation
Fusion 360 Assembly [18] 154,468 ✓ ✓ × − Joint Prediction
AutoMate [19] 3M+ ✓ ✓ × − Joint Prediction
ABC [77] 1M+ ✓ ✓ × − Reconstruction
Fusion 360 Reconstruction [30] 8,625 ✓ ✓ × − Reconstruction
SketchGraphs [20] 15M+ × × ✓ − Reconstruction
CAD as Language [27] 4.7M+ × × ✓ − Reconstruction
Sketch2CAD [24] 50,000 × × ✓ − Reconstruction
CAD2Sketch [28] 6000 × × ✓ − Reconstruction
DeepCAD [17] 178,238 ✓ × × − Reconstruction
PVar [15] 120,000 ✓ × × 60 Reconstruction, Classification
CADParser [16] 40,000 ✓ × × − Reconstruction
Free2CAD [22] 82,000 ✓ × ✓ − Reconstruction
TABLE III: Overview of the existing common object and mechanical CAD datasets with their properties. For each dataset, the
number of CAD models, representation formats such as B-Reps, Mesh, Sketch, and different tasks they are annotated for, such
as Segmentation, Classification and CAD Reconstruction, are reported. The first 4 rows of the table show the common object
datasets annotated for 3D shape classification. The remaining rows list mechanical object datasets, created for CAD analysis.

are processed by 1D and 2D convolution and pooling layers 96, 000 3D shapes which represent 26 English alphabet letters
while the weights of the convolution layers are shared among (a-z) with different fonts and dimensions.
all curves and surfaces in a B-Rep to make them permutation-
invariant. The 64-dimensional feature vectors derived from For CAD retrieval, however, the premise is the absence of
surfaces and curves by the convolution layers of the CNN labeled data, and thus, the UV-Net encoder needs to be trained
serve as the node and edge features within a face-adjacency in a self-supervised manner. This leads to the utilization of
graph G(V, E), where the set of graph nodes V represent the deep learning models designed for self-supervised training,
faces (surfaces) in B-Rep and the set of edges E represents such as Graph Contrastive Learning (GCL) [83] or Graph
the connection between faces. Subsequently, this graph is AutoEncoder (GAE) [60]. For training the UV-Net encoder
introduced to a multi-layer GCN, where the graph convolution as a self-supervised model, GCL is leveraged to apply trans-
layers propagate these features across the graph, enabling formations on the face-adjacency graph and make positive
the capturing of both local and global structures inherent pairs for each B-Rep sample. These transformations can be
in the shape. The UV-Net encoder is used as a backbone performed in various ways, such as randomly deleting nodes
for supervised and self-supervised learning on labeled and or edges with uniform probability or extracting a random node
unlabeled datasets, respectively. For CAD classification, which and its n-hop neighbors within a graph. Assuming that each
is a supervised task, UV-Net encoder is followed by a 2- B-Rep and its transformed verison are positive pairs, the UV-
layer classification network to map the learned features to Net encoder extracts the shape embeddings of each pair as
classes and the model is trained in an end-to-end manner {hi , hj } and a 3-layer non-linear projection head with ReLU
on 3 annotated datasets, SolidLetters [9], FabWave [74], and activations transforms these embeddings into latent vectors
FeatureNet [72]. SolidLetters is currently the biggest synthetic zi and zj , respectively. For a batch comprising N B-Rep
3D B-Rep dataset with a great variation in both geometry samples, the latent vector for each sample, along with its
and topology annotated for a classification task. It consists of corresponding positive pair, is computed. The entire model
is then trained in an end-to-end manner with the objective of
8

Point Coordinates
Surface normal
Trimming mask

Surface
V1
V1 V4
V4
2D CNN 64D

Node embeddings
V5
V5 U

V2 V
GCN
V6 V2 V6
128D

Pooling

V3
V3
1D CNN 64D 128D

Graph embedding
Curve

Fig. 4: Schematic representation of the UV-Net model architecture [9]. The model takes B-Rep data as input, generating grid-
structured features for surfaces and connecting curves. These UV-grid mappings are further processed through CNN and GCN
architectures to learn feature embeddings for the overall graph and its individual nodes.

bringing the embedding of each sample closer to its positive B. CAD Segmentation
pair. Simultaneously, the model works to treat the remaining
2(N − 1) B-Reps as negative examples, aiming to push them CAD segmentation entails decomposing a geometric solid
further away from the positive pair. Through this process, the model, represented in B-Rep format, into its individual com-
model learns to capture the intrinsic features and patterns ponents including faces and edges. CAD segmentation finds
within the data without the need for labeled examples. It applications in retrieving CAD parametric feature history,
essentially leverages the relationships within the data itself, machining feature recognition, Computer Aided Engineering
generated through transformations or augmentations, to learn (CAE) and Computer Aided Process Planning (CAPP). This
meaningful representations. This makes contrastive learning a is particularly intriguing as it facilitates the automation of
self-supervised learning approach which is particularly useful several tedious manual tasks within CAD design and analysis,
in scenarios where obtaining labeled data is challenging or especially when users need to repeatedly select groups of faces
impractical. For retrieving similar CAD models, the model and/or edges according to the manufacturing process as input
is trained on an unlabeled dataset like ABC. Subsequently, for modeling or manufacturing operations [76], [84], [85].
embeddings for random samples from the test set are used as However, progress in CAD segmentation was hindered until
queries, and their k-nearest neighbors are calculated within the very recently due to the absence of advanced deep learning
UV-Net shape embedding space. methods and large annotated datasets. MFCAD [76] is the first
synthetic segmentation dataset consisting of 15, 488 3D CAD
Likewise, the method proposed in [13] utilizes geometry models with planar faces, each annotated with 16 types of
as a means of self-supervision, extending its application to machining feature such as chamfer, triangular pocket, through
few-shot learning. Specifically, the method involves training hole, etc. Figure 5 shows some examples from this dataset.
an encoder-decoder structure to rasterize local CAD geometry, The CAD segmentation task can be framed as a node classi-
taking CAD B-Reps as input and generating surface rasteri- fication problem, with each 3D solid represented as a face adja-
zations as output. B-Reps are assembled piecewise through cency graph. The graph nodes, corresponding to B-Rep faces,
explicitly defined surfaces with implicitly defined boundaries. are then classified into distinct machining feature classes.
Accordingly, the encoder of this approach adopts the hier- CADNet [81] is one of the first proposed methods in this
archical message passing architecture of SB-GCN proposed regard which represents the B-Rep solid as a graph encoding
in [19] to effectively capture boundary features for encoding face geometry and topology, and utilizes a hierarchical graph
B-Rep faces. The decoder, in turn, reconstructs faces by convolutional network called Hierarchical CADNet to classify
simultaneously decoding the explicit surface parameterization the graph nodes (or solid faces) into different machining fea-
and the implicit surface boundary. The embeddings learned tures. For evaluation, this method not only leveraged MFCAD
through self-supervised learning in this method serve as input dataset, but it also created and released the extended version
features for subsequent supervised learning tasks, including of the dataset, MFCAD++, which consists of 59, 655 CAD
CAD classification on the FabWave dataset. Notably, with very models with 3 to 10 machining features including both planar
limited labeled data (ranging from tens to hundreds of exam- and non-planar faces. The UV-Net method, as introduced in
ples), the method outperforms previous supervised approaches the previous section, also addresses CAD segmentation in the
while leveraging smaller training sets. This underscores the same fashion, training its encoder in a supervised manner on
effectiveness of the differentiable CAD rasterizer in learning the MFCAD dataset. However, translating the B-Rep into a
intricate 3D geometric features. face adjacency graph results in the loss of some information
9

Coe 3. N
dge ext ate
2. M
Coe 4. M t
dge ate tar
ce 1. S
Fa

5. Edge
Edge
Vertex

Fig. 6: (left): The topology of a solid can be defined as a set of


face, edge, coedge, vertex entities. and it can be traveresed by
a sequence of walks. (right): The topology can be traveresed
by a sequence of walks from the starting entitiy (in red). The
walks can be Edge, Face, Next, Previous and Mate. As an
rectangular_through_slot 6sides_passage rectangular_blind_step triangular_pocket
triangular_through_slot rectangular_through_step triangular_blind_step 6sides_pocket
example, here the starting entity is the red coedge, followed
rectangular_passage
triangular_passage
2sides_through_step
slanted_through_step
rectangular_blind_slot
rectangular_pocket
chamfer
stock
by Mate, Next, Mate, Edge walks [14].

Fig. 5: Examples from the MFCAD dataset for manufacturing


driven segmentation. Each solid face is labeled with 16 types
of machining features [76].

regarding the relative topological locations of nearby entities.


Moreover, graph representations constructed based on UV ExtrudeSide CutSide Fillet RevolveSide
coordinates lack invariance to translation and rotation. ExtrudeEnd CutEnd Chamfer
RevolveEnd

BRepNet [14] is the first method specifically designed for


B-Rep segmentation based on deep learning and, notably, Fig. 7: Examples from the Fusion 360 Gallery dataset anno-
it accomplishes this without introducing to the network any tated for construction-based segmentation. Each solid face is
coordinate information. It operates directly on B-Rep faces labeled with operation used in its construction [14].
and edges, exploiting compact information derived from their
topological relationships for B-Rep segmentation. The mo-
this dataset comprises 35, 680 3D shapes each annotated with
tivation behind the BRepNet approach stems from the idea
8 types of modeling operations utilized to create faces in the
seen in the convolutional operation in CNNs used for image
respective model. Figure 7 shows some examples from this
processing. In this operation, the local features are aggregated
dataset.
by sliding a small window called a filter or kernel over the
The self-supervised method proposed in [13], is also as-
grid data and performing element-wise multiplication between
sessed on a segmentation task by initially pre-training the
the filter and the overlapping grid cells, then pooling the
network on a subset of 27, 450 parts from the Fusion 360
results. This concept is extended to B-Reps, allowing the
Gallery segmentation dataset. This pre-training is conducted
identification of a collection of faces, edges, and coedges at
in a self-supervised manner without utilizing face annotations.
precisely defined locations relative to each coedge in the data
Subsequently, the pre-trained network undergoes fine-tuning in
structure. Coedge is a doubly linked list of directed edges,
a supervised manner, being exposed to only a limited number
representing the neighboring structures of B-Rep entities. Each
of annotated parts to demonstrate the method’s performance in
coedge also retains information about its parent face and edge,
a few-shot setting on both Fusion 360 and MFCAD datasets.
its neighbor (mating) coedge, and also pointers to the next and
CADOps-Net [25] draws inspiration from CAD segmentation
previous coedge in the loop around a face. Figure 6 illustrates
methods like UV-Net [9] and BRepNet [14], which segment
a topology example traversed by a sequence of walks from
B-Reps into distinct faces based on their associated CAD
a given coedge (in red) to some of its neighboring entities
operations, and proposes a neural network architecture that
such as mating coedge, next and previous coedges, faces and
takes the B-Rep data of a 3D shape as input and learns to
edges. Information about geometric features around a coedge,
decompose it into various operation steps and their corre-
including face and edge type, face area, edge convexity and
sponding types. Besides, it introduces the CC3D-Ops dataset,
length, and coedge direction, is encoded as one-hot vectors
comprising 37, 000 CAD models annotated with per-face CAD
and concatenated in a predetermined order forming feature
operation types and construction steps 3 .
matrices X f , X e , X c of face, edge and coedge, respectively.
These matrices are then passed into a neural network where
convolution operations are performed through matrix/vector C. CAD Assembly
multiplication to recognize patterns around each coedge. Ad- The physical objects we encounter all around us are pri-
ditionally, the performance of BRepNet is evaluated on a marily intricate assemblies built up by CAD designers by
segmentation dataset from the Fusion 360 Gallery which is designing and then aligning multiple smaller and simpler parts
released alongside BRepNet method, as the first segmentation
dataset comprising of real 3D designs. As shown in Table III, 3 https://cvi2.uni.lu/cc3d-ops/
10

through CAD software. The meticulous pairing of parts in feature representations for face entities, AutoMate [19] takes
CAD is a laborious manual process, consuming approximately a different approach. It aims to learn representations for faces,
one-third of designers’ time [19]. It entails precise positioning loops, edges, and vertices by capturing the heterogeneous B-
of parts in relation to each other and specifying their relative Rep structure. This is achieved through the introduction of
movement. Consequently, optimizing this process is crucial the Structured B-Rep Graph Convolution Network (SB-GCN)
for enhancing the efficiency of CAD systems. This issue has architecture, a message-passing network designed for learning
been explored in various studies using deep learning methods the heterogeneous graph representation of the B-Rep.
in recent years [86]–[93] to simplify the part assembling and SB-GCN takes a heterogeneous graph as input, where faces
open avenues for various applications such as robotic assembly F, edges E, vertices V, and loops L serve as graph nodes,
[94], CAD assembly synthesis [95], part motion prediction and directed bipartite connection sets between them represent
[96], robot design optimization [97] and similarity analysis in graph edges. Specifically, the relations between B-Rep vertices
CAD assemblies [98]. However, all these approaches operate and edges are denoted by V:E and its transpose E:V. Similarly,
with non-parametric data structures like meshes, point clouds, E:L and L:E represent relationships between B-Rep edges and
and voxel grids. Consequently, they leverage GDL methods loops, while L:F and F:L denote connections between faces
tailored for these data structures, such as DGCNN [99], and loops. Additionally, there are undirected relations (meta-
PCPNet [100], PointNet [8], and PointNet++ [8], to learn the path) between geometrically adjacent faces, expressed as F:F.
surface representations. These methods primarily adopt a top- Each node is associated with a parametric geometry function
down approach to predict the absolute pose of a set of parts encoded as a one-hot feature vector. The network utilizes
for assembly in a global coordinate system. However, this structured convolutions to generate output feature vectors for
approach lacks support for the parametric variations of parts all graph nodes. The adjacency structure of different node
for modifying the assembly or modeling degrees of freedom, types is captured in an ordered hierarchy in different network
and it may also result in failures when parts cannot achieve layers. The initial three layers capture relations in the B-Rep
complete alignment. Additionally, these methods heavily rely hierarchy in a bottom-up order: Vertex to Edge, Edge to Loop,
on annotated datasets like PartNet [64], which exclusively and Loop to Face. Subsequently, the following k layers focus
provides data in mesh format. on capturing meta-relations between faces. The final three
A bottom-up approach to assembling pairs of parts, relying layers reverse the node relations: Face to Loop, Loop to Edge,
on pairwise constraints and utilizing the joint or contact infor- and Edge to Vertex. The network makes predictions for mate
mation available in B-Rep data, can address this issue without location and type using two distinct output heads. The mate
requiring class annotations on datasets. However, current B- location prediction head assesses pairs of mating coordinate
Rep datasets like ABC [77] and Fusion 360 Gallery [30] frames (MCFs) that are adjacent to the selected faces on the
lack pairing benchmarks for CAD assemblies, making them two mating parts. Meanwhile, the mate type prediction head
unsuitable for training assembly prediction models. AutoMate anticipates how this pair of MCFs should be mated. This
[19] and JoinABLe [18] are the only two works proposed is done by classifying features into seven different mating
recently, employing a bottom-up learning approach for pairing categories, i.e. fastened,revolute, planar, slider, cylindrical,
parts locally to form assemblies. Additionally, they have parallel, ball, pin slot. The model is trained on 180, 102 mates
released B-Rep datasets along with their methods, providing from the AutoMate dataset, and is integrated as an extension
pairing benchmarks for training models. to the Onshape CAD system. This integration assists designers
1) AutoMate [19]: Automate is the first work in this area by providing mate recommendations for pairs of parts during
focusing on CAD assembly by operating on parametric B- the CAD design process.
Rep data format. Additionally, this work introduces the first 2) JoinABLe [18]: Joinable is another recent method that
large-scale dataset of CAD assemblies in B-Rep format with employs a bottom-up approach to predict joints (or mates)
mating benchmarks, released to facilitate future research in between parts based on pairwise constraints. In contrast to
this field. AutoMate dataset is made by collecting publicly AutoMate, which relies on user contact surface selections
available CAD designs from OnShape API, containing 92, 529 on parts to rank and recommend multiple mating solutions,
unique assemblies with an average size of 12 mates each, JoinABLe identifies joints between parts without being limited
and 541, 635 unique mates. Mating in this work means align- to predefined surfaces, doing so automatically and without
ing two parts with pairwise constraints defined based on requiring any user assistance. The Fusion 360 Gallery As-
B-Rep topology. These pairwise constraints are referred to sembly dataset is concurrently introduced and made available
as mates or joints, and they dictate the relative pose and with the JoinABLe method for the purpose of training and
degrees of freedom (DOF) of parts within an assembly. Two evaluating the JoinABLe model. This dataset comprises two
parts can be mated through various topological entities, such interconnected sets of data: Assembly and Joint. These datasets
as faces, edges, and vertices. Therefore, it is essential to are collected from user-designed CAD models publicly ac-
learn the feature representation of multiple topological entities cessible in the Autodesk Online Gallery. The Assembly data
at different levels to address the complexities of the CAD encompasses 8, 251 assemblies, totaling 154, 468 individual
assembly problem. In contrast to previous CAD represen- parts, along with their corresponding contact surfaces, holes,
tation learning approaches, like BRepNet [14] and UV-Net joints, and the associated graph structure. The Joint data
[9], which construct a face adjacency graph to capture the includes 23, 029 distinct parts, incorporating 32, 148 joints
homogeneous structure of the B-Rep and focus on learning between them. JoinABLe is trained using the joint information
11

should be set to 1, while the others remain 0. The JoinABLe


model aims to find that one positive element and it comprises
three main components: encoder, joint axis prediction, and
joint pose prediction. The encoder follows a Siamese-style
architecture with two distinct MLP branches. One branch
focuses on learning node features that represent B-Rep faces,
while the other concentrates on learning node features repre-
senting B-Rep edges. The learned embeddings for faces and
edges in each part are subsequently concatenated to form the
node embeddings for the corresponding graph. These node
embeddings are then input into a Graph Attention Network
v2 (GATv2) [101] which captures the local neighborhood
structure within each graph through message passing.
Fig. 8: An example showing how joints are defined between The joint axis prediction is formulated as a link prediction
parts (solids) in Fusion 360 Assembly-Joint dataset. Give task, aiming to predict a connection between the graphs G1
a pair of parts, JoinABLe predicts the joint axis and pose and G2 by linking two nodes. This involves edge convolution
between them according to the joint defined by the ground on the joint graph Gj which illustrates the connections be-
truth [18]. tween the two graphs and is updated with the node features
learned by the encoder in the previous step. Assuming that
hv , hu represent the learned features for the two nodes v,
extracted from Fusion 360 Gallery Assembly-Joint dataset. u, from G1 , G2 , respectively, the edge convolution to learn
The Assembly Dataset [18] is a subset of designs in which the connection between the two nodes in the two graphs is
each CAD model is made of multiple parts. In CAD design performed as follows:
context, each assembly or 3D shape is a collection of parts
huv = Φ(hu ⊕ hv ), (1)
that joined together and a set of assemblies can make a CAD
design or 3D object. where Φ(·) is a 3-layer MLP applied to the concatenated
Two parts are aligned through their joint axes, each pos- features of hu , hv . Following the edge convolution, a softmax
sessing a specific direction. The joint axis on each part can be function is applied to normalize the features and predict the
defined on either a face or edge entity, featuring an origin most probable link between the nodes in Gj . After predicting
point and a directional vector. For instance, on a circular the joint axes to align the two parts, the pose prediction
surface entity, the origin point is the center of the circle head employs a neurally guided search approach to iterate
and the direction vector is the normal vector. These joint through the top-k joint axis predictions. As an supplementary
axes information for pairs of parts are provided in Fusion work, JoinABLe has also proposed assembling a multi-part
360 Gallery Assembly-Joint dataset, which is a subset of design using only the individual parts and the sequence of
Fusion 360 Gallery Assembly dataset, as ground truth for part pairs derived from the Assembly dataset. However, in real-
training a joint prediction deep learning model. This dataset world applications, this well-defined assembly sequence and
contains 23,029 parts with 32, 148 joints between them. As the corresponding assembly graph might not be available for
shown in Figure 8, given two parts, JoinABLe is designed the network to use. Additionally, a misalignment in any of the
to predict the parametric joint between them, encompassing assembly steps could result in an incorrect overall assembly.
joint axis prediction (origin points on two parts and direction Therefore, it is suggested that for large and complex assem-
vectors) and joint pose prediction. To facilitate this, each blies, a combination of top-down and bottom-up approaches
part is modeled as a graph G(V, E) with B-Rep faces and might be more effective.
edges serving as graph nodes, and their adjacency relationships In the CAD Assembly problem two primary questions often
functioning as graph edges. B-Rep faces are encoded as one- arise: how to select the pair of parts to joint (or mate), and how
hot feature vectors, representing surface types like plane and to assemble them [102]. The former question can be addressed
cylinder, along with a flag indicating if the surface is reversed by leveraging similarity analysis and retrieval methods to
relative to the face. Similarly, one-hot vectors for B-Rep edges identify suitable pairs of parts. AutoMate and JoinAble, on
encompass curve characteristics such as line and circle, curve the other hand concentrate on answering the latter question
length, and a flag denoting whether the curve is reversed by learning the process of mating two parts in an assembly.
concerning the edge. These one-hot vectors function as graph
node features. V. CAD C ONSTRUCTION WITH G ENERATIVE D EEP
For two parts represented by graphs G1 and G2 with N L EARNING
and M nodes, respectively, the joint graph Gj is constructed to CAD construction with generative deep learning involves
illustrate connections between the two graphs. This connection leveraging advanced GDL methods to automatically generate
information in Gj can be expressed as a binary matrix of or assist in the creation of parametric CAD models. These
dimensions N × M . In alignment with the ground truth approaches can support designers in various ways to streamline
joints provided in the Joint dataset, only one matrix element, the design process. This includes tasks like generating or auto-
representing the joint between two entities across the two parts, completing sketches, as well as generating CAD operations to
12

construct a 3D model based on the designed sketch. In this noisy rendered images can be created to train a deep learning
section, we categorize these methods into five distinct groups: model for predicting the design steps of a sketch given its
1) methods focusing on 2D sketches, aiming to automate the hand-drawn image. The 3D design of the model can then
sketching process in a 2D space as an initial step before devel- be obtained by extruding the designed 2D sketch. In [20],
oping 3D CAD models, 2) methods targeting 3D CAD model an auto-regressive model is proposed which employs Sketch-
reconstruction given the sketch of the model, 3) methods Graphs dataset for two use cases: 1) Autoconstrain, which is
generating CAD construction sequences, specifically focused conditional completion of an sketch by generating constraints
on sketch and extrude operations, for CAD construction, 4) between primitives given the unconstrained geometry, and 2)
methods which perform direct B-Rep synthesis to generate Generative modeling, which is auto-completing a partially de-
3D CAD models, and 5) methods generating 3D CAD models signed sketch by generating construction operations for adding
from Point Cloud data. the next primitives and constraints between them. Although
most of the CAD software have a built-in constraint solver to
A. Engineering 2D Sketch Generation for CAD be used in design process, these generative methods are useful
Engineering 2D sketches form the basis 3D CAD design. when an unconstrained sketch is uploaded to the software as
The 2D sketches composed of a collection of geometric a drawing scan designer needs to find the constraints between
primitives, such as vertex, lines, arcs, and circles, with their the sketch primitives and/or complete the sketch. This problem
corresponding parameters (e.g. radius, length, and coordi- is quite similar to program synthesis or induction in constraint
nates), and explicit imposed constraints between primitives programming. The SketchGraphs dataset and its generative
(e.g. perpendicularity, orthogonality, coincidence, parallelism, methods can be a good baseline for the future works in this
symmetry, and equality) determining their final configura- direction. In the following, the two generative use-cases are
tion. These 2D sketches can then be extruded to make 3D described in more detail.
designs. Synthesizing parametric 2D sketches and learning Let us assume a sketch represented by a multi-hypergraph
their encoded relational structure can save a lot of time and G = (V, E) where nodes V denote primitives and edges E
effort from designers when designing complex engineering denote constraints between them. In this graph, each edge
sketches. However, leveraging deep learning approaches in this might connect one or more nodes, and multiple edges might
regard needs large-scale datasets of 2D engineering sketches. share the same set of connected nodes. An edge with a single
Most of the existing large-scale datasets provide hand-drawn node is indicated as a self-loop showing a single constraint
sketches of common objects, such as furniture, cars, etc. The (e.g. length) for a single primitive (e.g. line), and a hyper-edge
QuickDraw dataset [103] is collected from the Quick, Draw! applies on three or more nodes (e.g. mirror constraint applied
online game [104], and Sketchy dataset [79] is a collection on two primitives while assigning one more primitive as an
of paired pixel-based natural images and their corresponding axis of symmetry). An example of a sketch graph is illustrated
vector sketches. These sketch datasets are based on vector in Figure 3. Each constraint is identified by its type, and each
images of sketches not their underlying parametric relational primitive is identified by its type and its parameters (different
geometry. For reasoning about parametric CAD sketches and primitives might have different number of parameters). For
inferring their design steps using deep learning models, a Autoconstrain task, all the graph nodes (primitives) are given,
large-scale dataset of parametric CAD sketches is needed. and the model is trained in a supervised manner to predict the
1) SketchGraphs [20]: The first dataset introduced in this sequence of graph edges considering the ground truth ordering
regard is SketchGraphs, which is a collection of 15 million of constraints in the dataset. This problem can be seen as
real-world 2D CAD sketches from Onshape platform [40], an example of graph link prediction [105] which predicts the
each of which represented as a geometric constraint graph induced relationships between the graph nodes. Starting from
where the nodes are the geometric primitives and edges the first construction step, the model first predicts which node
denote the designer-imposed geometric relationships between should be connected to the current node and then creates a link
primitives. An open-source data processing pipeline is also (edge) between these two neighboring nodes. Then it predicts
released along with the dataset to facilitate further research the type of this edge (constraint). For generative modeling task
on this area.4 SketchGraphs does not only provide the under- which auto-completes a partially completed sketch (graph) by
lying parametric geometry of the sketches, but it also pro- generating new primitives (nodes) and constraints between
vides ground truth construction operations for both geometric them (edges), the primitives are only represented by their type
primitives and the constraints between them. Therefore, it (the primitive parameters are ignored) and the constraints are
can be used for training deep learning models for different represented by both their type and their numerical or categor-
generative applications facilitating the design process. One of ical parameters. As an example, if the constraint between two
these applications which could be used as an advanced feature primitives is distance, its parameter could be a scalar value
in CAD software is to automatically build the parameteric denoting the Euclidean distance between the two primitives.
CAD model given a hand-drawn sketch or a noisy scan of However, while this model predicts both type and parameters
the object. The SketchGraphs processing pipeline can be used for constraints, it only predicts the type of the primitives, not
to produce noisy renderings of the sketches. In this way, their parameters, and the initial coordinates of the primitives
a large-scale dataset or paired geometric sketches and their might not fit into the sketch and constraints. Therefore, the
final configuration of the primitive coordinates in the sketch
4 https://github.com/PrincetonLIPS/SketchGraphs needs to be found by the CAD software’s built-in geometric
13

by a Transformer, and the mesh faces are modeled conditioned


on the mesh vertices by a combination of Transformers and
Vertex pointer networks [107]. Similar to PolyGen, CurveGen also
generates directly the sketch hypergraph representation by first
Line
generating the graph vertices V which are used to make curves,
Arc
and then generating the graph hyperedges E conditioned on
generated vertices as follows:
Circle p(G) = p(E|V)p(V). (2)
In this way, the network predicts the precise coordinates of
each primitive while the type of primitive is implicitly encoded
in the hyperedges which group vertices together to make
Fig. 9: The illustration showcases a basic sketch composed
different types of primitives. Therefore, this method is inde-
of 12 vertices and 9 hyperedges of types line, circle and
pendent of any constraint solver for finding the final primitives
arc entities. On the right side, a sequence of commands for
configuration. By having the precise coordinates of the sketch
rendering this sketch, following the grammar proposed by [21],
curves, the constraints between them can be automatically
is presented. Each line connects 2 vertices denoted by ∆, every
obtained as a post-processing step. However, implicit inference
arc is defined by traversing 3 vertices, and circles pass through
of constraints makes editing the sketch in the software more
4 vertices. Notably, in each loop, the initiation point of each
difficult. If the designer wants to change one of the constraints
entity coincides with the termination point of the preceding
between the primitives, for example scale of a distance, this
entity in the sequence.
change will not propagate through the whole sketch and
primitives, because the exact positioning of the primitives
are somehow fixed. In the Turtle Graphics representation, the
constraint solver. This issue is addressed in the next proposed sketch is represented by a sequence of drawing commands,
work for 2D sketch generation, CurveGen-TurtleGen [21], pen-down, pen-draw, pen-up, which can be executed to form
which is introduced in V-A2. an engineering sketch in the hypergraph representation. The
2) CurveGen-TurtleGen [21]: As discussed previously in TurtleGen network is an autoregressive Transfromer model
V-A1, the generative models in SketchGraphs rely on the generating a sequence of drawing commands to iteratively
built-in sketch constraint solver in CAD software to set the draw a series of closed loops forming the engineering sketch.
final configuration of the produced sketch. This issue is In this way, each sketch is represented as a sequence of loops,
addressed by two generative models, CurveGen and Turtle- and each loop is made by a LoopStart command which lifts
Gen, proposed in [21] which encode constraint information the pen, displace it to the specified position, put it down,
implicitly in the geometric coordinates to be independent and starts drawing parametric curves specified by the Draw
from the sketch constraint solver. In this work, the primitive command. In Figure 9, a simple sketch with its corresponding
types are simply limited to lines, curves, and circles, as they command sequence is illustrated. The sketch encompasses 2
are the most common primitives used in 2D sketches, and loops and 9 Draw commands, including arc, line, and circle
it is considered that the constraints between the primitives types. The initial loop starts at a position ∆ = (int, int) and
in sketches should be defined in a way that the geometric comprises 4 arcs. Each arc originates from the endpoint of
primitives can form closed profile loops. In this regard, this the preceding arc, passing through 2 additional vertices. This
method proposed two different representations for engineering loop transitions into a sequence of 4 lines. The first line begins
sketches: Sketch Hypergraph representation which is used by where the last arc ended, traversing one more vertex, and this
CurveGen, and Turtle Graphics representation which is used pattern repeats for the subsequent lines. The second loop is a
by TurtleGen. In the Sketch Hypergraph representation, the circle that begins at another position ∆ and passes through
sketch is represented as a hypergraph G = (V, E), where a 3 additional vertices. CurveGen and TurtleGen have been
set of vertices V = {ν1 , ν2 , ..., νn } with their corresponding evaluated on the SketchGraphs dataset, demonstrating superior
2D coordinates νi = (xi , yi ) are encoded as graph nodes, performance compared to the generative models proposed in
while E denotes a set of hyperedges connecting two or more SketchGraphs. We refer the reader to [21] for further details
vertices to make different primitives. The primitive type is on models’ architecture, training and evaluation settings.
defined by the cardinality of the hyperedge. For example, the 3) SketchGen [26]: Concurrent to CurveGen-TurtleGen,
primitive line is made by two connected vertices, arc is made SketchGen, which is an autoregressive generative method,
by three connected vertices and circle can be seen as a set proposed based on PolyGen [106] and pointer networks [107].
of four connected vertices. Figure 9 shows an example of a Unlike PolyGen, SketchGen aims to capture the heterogeneity
simple sketch consisting of 12 vertices and 9 hyperedges. This of the primitives and constraints in sketches, where each type
representation is used by CurveGen, which is an autoregressive of primitive and constraint might have a different number
Transformer based on PolyGen [106]. PolyGen is a an autore- of parameters, each of different type, and therefore have
gressive generative method for generating 3D meshes using a representation of a different size. The selection of input
Transformer architectures which are able to capture long-range sequence representation has a great impact on the perfor-
dependencies. The mesh vertices are modeled unconditionally mance of the Transformers, and transforming heterogeneous
14

constraint graphs of sketches into an appropriate sequence resulting sequences of tokens are then fed into the Transformer
of tokens is considerably challenging. One simple solution architectures to generate primitives and constraints. The con-
is padding all the primitives’ and constraints’ representations straint generator network not only receives the embedded and
to make them of the same size. However, this technique positionally encoded tokens for constraints as input, but also it
is inefficient and inaccurate for complex sketches. Sketch- receives the embedded and positionally encoded sequence of
Gen proposed a language with a simple syntax to describe tokens that represents the primitives generated in the previous
heterogeneous constraint graphs effectively. The proposed step, to generate constraints conditioned on primitives. Similar
language for CAD sketches encodes the constraints and to [20], this model is evaluated on SketchGraphs dataset for
primitives parameters using a formal grammar. The terminal the two tasks of constraint prediction given sketch primitives,
symbols for encoding the type and parameters of primi- and full sketch generation from scratch by generating both
tives are {Λ, Ω, τ, κ, x, y, u, v, a, b} and for constraints are primitives and constraints sequentially. The final generated
{Λ, ν, λ, µ, Ω}. The start and end of a new primitive or sketch needs to be regularized by a constraint solver to
constraint sequence are marked with Λ and Ω, respectively. remove the potential errors caused by quantizing the sketch
The primitive type is denoted with τ , and ν shows the the parameters.
constraint type. κ, x, y, u, v, a, b show the specific parameters 4) CAD as Language [27]: Concurrent to CurveGen-
for each primitive, such as coordinate and direction. λ, and µ TurtleGen and SketchGen, CAD as Language is another au-
are the specific constraint parameters indicating the primitive toregressive Transformer method proposed for 2D sketch gen-
reference of the constraint and the part of the primitive eration based on PolyGen [106]. Unlike CurveGen-TurtleGen
it is targeting, respectively. This formal language, enables that predicts only the primitive parameters with implicit
distinguishing different primitive or constraint types along constraints and independent of constraint solver, CAD as
with their respective parameters. For example, the sequence for Language and SketchGen methods generate both primitives
a line primitive is Λ, τ, κ, x, y, u, v, a, b, Ω, which starts with and constraints but are dependent to a built-in constraint
Λ, followed by the primitive type τ = line, the construction solver to obtain the final configuration of the sketch. However,
indicator κ, the coordinates of the starting point x and y, the SketchGen produces primivies and constrains via two separate
line direction u and v, the line range a, b, and ends with Ω. The Transformer networks while do not support arbitrary orderings
sequence for a parallelism constraint is Λ, ν, λ1 , µ1 , λ2 , µ2 , Ω of primitive and constraint tokens. In SketchGen, the sketch is
which starts with Λ, followed by the constraint type ν = parsed into a sequence of tokens, by initially iterating through
parallelism, the reference to the first primitive λ1 , the part all the primitives and then iterating through all the constraints.
of the first primitive µ1 , the reference to the second primitive CAD as Language method, not only handles the arbitrary
λ2 , the part of the second primitive µ2 , and ends with Ω. In orderings of primitives and constraints, but also generates
this way, each token qi (a sequence of symbols) represents both primitives an constraint via one Transformer network.
either a primitive or a constraint, and each sketch Q is Unlike all the previous sketch generation methods which
represented as a sequence of tokens. Similar to PolyGen and are evaluated on SketchGraphs dataset, CAD as Language
CurveGen-TurtleGen, as stated in Eq. (2), the generative model evaluated its generative method on a new collection of over 4.7
of SketchGen is also decomposed into two parts by first million sketches from the OnShape platform which avoids the
generating the primitives p(P) and then generating constraints problem of data redundancy in SketchGraphs. This collected
conditioned on primitives p(C|P)p(P) as follows: dataset and the corresponding processing pipeline are publicly
available on Github.5
p(S) = p(C|P)p(P). (3)
CAD as Language uses a method for describing structured
Therefore, the generative network learns the distribution of objects using Protocol Buffers (PB) [108], which demonstrates
constraint sketches via two autoregressive Transformers, one more efficiency and flexibility for representing the precise
for primitive generation, and the other for conditional con- structure of complex objects than JSON format. In this format,
straint generation. The sketch is parsed into a sequence each sketch is described as a PB message. Similar to other
of tokens, by initially iterating through all the primitives Transformer-based methods, the first and most important step
and expressing them using the language sequences explained in the processing pipeline is to parse sketches to a sequence
above. Subsequently, a similar process is applied to repre- of tokens. In this method, each sketch (or PB message) is
sent all the constraints within the sketch. As illustrated in represented as a sequence of triplets (di , ci , fi ), where each
Figure 10, the input of the primitive generator network is a triplet with index i denotes a token. Each token (triplet)
sequence of concatenated primitive tokens seperated by Λ, represents only one component (type or parameter) of a
such as Λ, τ1 , κ1 , x1 , y1 , u1 , v1 , a1 , b1 , Λ, τ2 , κ2 , x2 , y2 , ..., Ω, primitive or constraint in a sketch, where di is a discrete value
and the input of the constraint generator is a sequence denoting the type of object it is referring to, the type of entity
of concatenated constraint tokens seperated by Λ, like (primitive, constraint, etc), ci is a continuous value denoting
Λ, ν1 , λ11 , µ11 , λ12 , µ12 , Λ, ν2 , λ21 , ....Ω. the parameter value for the corresponding entity. At each time,
All the primitive and constraint parameters are quantized either di or ci is active and gets a value, and the other one
first and then mapped by an embedding layer to make the is set to zero. fi is a boolean flag specifying the end of a
input feature vectors for the network. The positional infor- repeated token (for example the end of an object containing
mation of the tokens in the sequence is also captured by
a positional encoding added to each embedding vector. The 5 https://github.com/google-deepmind/deepmind-research/tree/master/cadl
15

Primitive Generator Constraint Generator

Fig. 10: A simple illustration of the SketchGen generative approach, comprising two generative networks for primitive and
constraint generation. At each generation step, the network produces the next token based on both the input and the previously
generated token. The start and end of a new primitive or constraint sequence are denoted by Λ and Ω, respectively. The figure
showcases the parameters of a line primitive and a parallelism constraint as an example. More details about the model structure
can be found in the original paper [26].

design step), and converts it into a valid PB message. This


Triplet Field
interpreter is designed in a way to handle arbitrary orderings
(0, 0.0, False) objects.kind
of tokens and make sure that all the token sequences can be
(0, 0.0, False) entity.kind
converted into a valid PB message (sketch). This interpreter
(1, 0.0, False) line.is_construction
guides the Transformer network through the sketch generation
process. The Transformer network receives as input a sequence
Line (0, -0.8, False) line.start.x
(0, 2.6, False) line.start.y
of tokens, and at each time step it outputs a raw value which
is passed to the interpreter to infer the corresponding triplet
(0, 1.1, False) line.end.x
for that value. This triplet is a part of a PB message which
(0, -3.7, False) line.end.y
makes the final sketch. When the interpreter inferred the output
(0, 0.0, False) objects.kind
value, it propagates its interpretation back to the Transformer
(1, 0.0, False) entity.kind
to guide it through generating the next value. Therefore, the
(0, 0.0, False) point.is_construction
Point structure this method proposed for parsing a sketch into tokens
(0, 1.1, False) point.x
and interpreting them, enables it to generate every sketch com-
(0, -3.7, False) point.y
ponent (primitives and constraints) via only one Transformer
(0, 0.0, True) objects.kind
network, while handling different orderings of input tokens. A
conditional variant of the proposed Transformer model is also
Fig. 11: The description of a simple sketch, consisting of a line
explored and evaluated in this method, where it is conditioned
and one point on one of its ends, using the language structure
on an input image of the sketch.
proposed by [27]. The active element of each triplet (on left
5) VITRUVION [23]: This method is the latest autore-
side) is specified in bold red color, and the corresponding field
gressive generative model proposed for sketch generation.
of the object for each triplet is shown on the right side.
Similar to SketchGen and CAD as Language, VITROVION
also generates both primitives and constraints autoregressively.
However, this method is more similar to SketchGen in a sense
an entity). 11 shows an example of tokens specifying a line that primitives and constraints are generated independently
and a point primitive on one of its ends, respectively. through training two distinct Transformer networks. The major
As it is shown in this example, the first triplet (objects.kind) contribution of this method compared to the previous ones,
is always associated with the type of the object the token is conditioning the model on various contexts such as hand-
referring to. The values in the second triplet depends on the drawn sketches. This contribution is one step forward towards
type of object specified in the first triplet. As in this example, the highly-sought feature in CAD softwares to reverse engineer
d1 = 0 shows that this sequence of tokens is about creating a a mechanical part given a hand-drawing or a noisy scan of it.
primitive (like a line), therefore the second triplet specifies Generating parametric primitives and constraints of a sketch
the type of the primitive (entity.kind) which is 0 for line, given a hand-drawing or noisy image saves a lot of time and
and 1 for point. The next triplets in the sequence specify effort in designing process.
the specific parameters associated with the corresponding Similar to the previous methods, VITROVION is a general-
primitive identified in the second triplet. For example, line ization of PolyGen [106]. It generates propability distribution
primitive is defined with a start and ending point, while the of sketches by first generating primitives P, and then generat-
point primitive, which is also repeated in the line primitive, is ing constraints C conditioned on primitives. However, in this
defined with x, y coordinates. method, the primitive generation is optionally conditioned on
For interpreting these triplets (tokens), CAD as Language an image follows:
method also proposed a custom interpreter which receives as
p(P, C|I) = p(C|P)p(P|I), (4)
input a sequence of tokens, each representing a sketch com-
ponent (which can be an entity type, parameter or any other where I is a context such as hand-drawn image of a sketch.
16

However, constraint modeling in this method only supports the operator are regressed via specific segmentation networks.
constraints with one or two reference primitives, and not hy- For example, for the predicted Sweep operator, a SweepNet
peredges connecting more than two primitives as in CurveGen- is trained to infer the corresponding parameters. Each of the
TurtleGen method. The generative models are trained and four segmentation networks have U-Net structure with one
evaluated on SketchGraphs dataset for autoconstraint, auto- encoder and two decoders, one producing the probability map
complete, and conditional sketch synthesis tasks. As explained of base face for the sketch and the other one producing the
in V-A1, in autoconstraint application the network generates corresponding curve segmentation map. The segmentation is
constraints conditioning on a set of available primitives, but followed by an optimization process for fitting the operation
in autocomplete task, an incomplete sketch is completed by parameters.
generating both primitives and constraints. In both cases, The parametric nature of the recognized operations provides
the constraint generator network is conditioned on generated strong regularization to approximate input strokes and allows
primitives which can be imperfect. VITROVION increases users to refine the result by adjusting and regressing param-
the robustness of the constraint generator network by con- eters. The modeling system also incorporates standard CAD
ditioning it on noise-injected primitives. The final primitive modeling features such as automatic snapping and autocom-
and constraint parameters are finally adjusted via a standard pletion. The output of the system is a series of CAD instruc-
constraint solver. For image-conditional sketch synthesis, the tions ready to be processed by downstream CAD tools. Since
model infers primitives conditioning on a raster-image of a no dataset of paired sketches and 3D CAD modeling operation
hand-drawn sketch. In this regard, an encoder network based sequences exists, Sketch2CAD introduced a synthetic dataset
on Vision Transformer [109] architecture is used to obtain the of 40, 000 shapes for training and 10, 000 shapes for testing,
embeddings of the image patches, and the primitive generator containing step-by-step CAD operation sequences and their
network then cross-attends to these patch embeddings for corresponding sketch image renderings. The dataset, code, and
predicting sketch primitives. This idea is based on a similar visual examples are publicly available.6
idea in PolyGen for image-conditioned mesh generation. 2) Free2CAD [22]: One of the main challenges of
This method tokenizes a sketch by representing each prim- Sketch2CAD is that it can only handle one CAD operation
itive and constraint as a tuple of three tokens: value, ID, at a time, which means that it assumes that at each step,
position. The value token has two-folds, one indicating the the user drawing added to the shape corresponds to only
type of primitive or constraint and the other indicating the one CAD operation. It requires the user to draw the sketch
numerical value of the associated parameter of that primitive or sequentially and part-by-part so that it can be decomposed
constraint. ID token indicates the type of parameter specified into the meaningful CAD operations. Free2CAD is proposed to
by the value token, and position token indicates the ordered address this restriction. This is the first sketch-based modeling
index of the primitive or constraint to which this ID and value method in which the user can input the complete drawing of a
tokens belong to. The ordering of primitives are according complex 3D shape without needing to have expert knowledge
to design steps indicated in SketchGraphs dataset, and the of how this sketch should be decomposed into the CAD
ordering of constraints are according to the ordering of their operations or following a specific strategy for drawing the
corresponding reference primitives. sketch and working with the system.
Free2CAD is a sequence-to-sequence Transformer network
which receives as input a sequence of user drawn strokes
B. 3D CAD Generation from Sketch
depicting a 3D shape, processes and analyzes them to produce
1) Sketch2CAD [24]: The first work proposed for sequen- a valid sequence of CAD operation commands, which may be
tial 3D CAD modeling by interactive sketching in context is executed to create the CAD model. The main contribution of
Sketch2CAD. This work is a learning-based interactive model- this method is the automatic grouping of the sketch strokes
ing system that unifies CAD modeling and sketching by inter- and the production of the parametric CAD operations for each
preting the user’s input sketches as a sequence of parametric group of strokes sequentially, conditioned on the groups that
3D CAD operations. Given an existing incomplete 3D shape have been reconstructed in the previous iterations. The method
and input sketch strokes added on it by the user, Sketch2CAD is comprised of two phases, namely stroke grouping phase and
first obtains the normal and depth maps of sketching local operation reconstruction phase. In the stroke grouping phase,
context which are introduced to the CAD operator classifica- first each sketch stroke is embedded as a token via a specially
tion and segmentation networks. Classification is done by a designed Transformer encoder network, then it is processed
CNN network predicting the type of CAD operation needed by the Transformer decoder network which produces group
for creating the corresponding input sketch on the 3D shape. It probabilities for the input tokens. In this way, the sketch
receives as input the concatenation of three maps, each of size strokes which might make a specific part of the shape are
256 × 256, i.e., the sketching map representing stroke pixels grouped together. The most closely related work to the stroke
with binary values, and normal and depth maps representing grouping phase is the SketchGNN [56] method which pro-
the local context via rendering the shape through a specific poses a Graph Neural Network approach for freehand sketch
viewpoint. It should be noted that only four types of CAD semantic segmentation. Next, in the operation reconstruction
operations, which are the most widely used ones, are supported phase, the candidate groups are converted into geometric prim-
in this method, i.e., Extrude, Add/Subtract, Bevel, Sweep.
According to the predicted operation type, the parameters of 6 https://geometry.cs.ucl.ac.uk/projects/2020/sketch2cad/
17

itives with their corresponding parameters, conditioned on the The trained neural network’s generalization to real shapes is
existing geometric context. This step is followed by geometric evaluated on a test set of 108 CAD sequences from the ABC
fitting and grouping correction before passing back the updated dataset, yielding promising results. However, it is worth noting
groups as geometric context for the next iteration. At the end that this method is evaluated on a limited number of real
of this process, the desired CAD shape and the sequence of sketches, given the relatively small size of the OpenSketch
CAD commands are obtained. The method is also extended to dataset. Additionally, the CAD2Sketch method, built on CAD
handle long stroke sequences making complex shapes using a sequences from existing large-scale CAD datasets, is limited
sliding window scheme progressively outputting CAD models. to sequences composed of sketch and extrusion operations.
Similar to the Sketch2CAD [24], Free2CAD also provides
a large-scale synthetic dataset of 82, 000 paired CAD mod- C. 3D CAD Command Generation
eling operation sequences and their corresponding rendered While previous approaches have introduced synthetic
sketches which are segmented based their corresponding CAD datasets for 3D CAD reconstruction, the absence of a stan-
commands. The code and dataset of this method are publicly dardized collection of human-designed 3D CAD models with
available 7 . The evaluation results of Free2CAD on both their retained CAD command sequences poses a limitation. Similar
generated dataset and on Fusion 360 dataset illustrate its to the valuable contribution of the SketchGraphs dataset in
high performance in processing different user drawings and the area of 2D sketch synthesis, a curated dataset of human-
producing CAD commands which make desirable 3D shapes designed 3D CAD models with preserved CAD command
when executed by CAD tools. sequences would greatly benefit research and the development
3) CAD2Sketch [28]: Unlike Sketch2CAD which tries to of practical methods for real-world applications. Fusion 360
facilitate the design process for non-expert users in an interac- Reconstruction dataset [30] fills this gap as the first human-
tive modeling system, CAD2Sketch is designed to assist expert designed dataset, featuring 8, 625 CAD models constructed
industrial designers. As the name suggests, CAD2Sketch is a using a sequence of Sketch and Extrude CAD operations.
method dedicated to synthesizing concept sketches from CAD Accompanying this dataset is an environment known as the
models. Concept sketching is a preliminary stage in CAD Fusion 360 Gym, capable of executing these CAD opera-
modeling, wherein designers refine their mental conception tions. Each CAD model is represented as a Domain-Specific
of a 3D object from rough outlines to intricate details, often Language (DSL) program, a stateful language serving as a
using numerous construction lines. An example of a simple simplified wrapper for the underlying Fusion 360 Python
concept sketch is depicted in 12, shown on the left side, API. This language keeps track of the current geometry
alongside its refined version displayed on the right side. under construction, updated iteratively through a sequence of
Notably, concept sketching mirrors the detailed steps in a sketch and extrude commands. The data and correponding
designer’s mind, akin to the stages of CAD modeling, while codes are publicly available 8 . This dataset is benchmarked
sketches in the Sketch2CAD dataset typically present the final through the training and evaluation of a machine learning-
free-hand sketch without these detailed auxiliary construction based approach featuring neurally guided search for program-
lines. CAD2Sketch introduced a large-scale synthetic dataset matic CAD reconstruction from a specified geometry. The
of concept sketches by proposing a method for converting approach begins by training a policy, which is instantiated as a
CAD B-Rep data into concept sketches. CAD2Sketch es- Message Passing Network (MPN) [110], [111] with an original
tablishes a large-scale synthetic dataset of concept sketches encoding of state and action. This training is conducted
by introducing an method to convert CAD B-Rep data into through imitation learning, drawing insights from ground truth
concept sketches. The dataset is comparable to the OpenSketch construction sequences. In the subsequent inference stage,
[80] dataset which has 400 real concept sketches crafted the method incorporates a search mechanism, utilizing the
by various expert designers. However, CAD2Sketch targets learned neural policy to iteratively engage with the Fusion 360
to bridge the gap between synthetic and real data, enabling Gym environment until a precise CAD program is discerned.
the training of neural networks on concept sketches. The However, a limitation of this method is its assumption that
CAD2Sketch method initially generates construction lines for the 2D geometry is given, with the method solely predicting
each operation in a CAD sequence. To avoid overwhelming the the sketches to be extruded and their extent at each iteration.
sketch with too many lines, a subset of these lines is chosen Subsequently, this dataset has been utilized by generative
by solving a binary optimization problem. Subsequently, the methods that tackle scenarios where the geometry is not
opacity and shape of each line are adjusted to achieve a visual provided, and the entire 3D model needs to be synthesized
resemblance to real concept sketches. The synthetic concept from scratch using Sketch and Extrude operations.
sketches produced by CAD2Sketch closely resemble their As another effort to infer CAD modeling construction
corresponding real pairs in OpenSketch, so that even designers sequences through neural-guided search, a method in which
can hardly distinguish between them [28]. CAD2Sketch also the B-Rep data of each CAD model is expressed in the form
generates a substantial number of paired sketches and normal of a zone graph, where solid regions constitute the zones
maps, utilized for training a neural network to infer normal and surface patches (curves) form the edges is proposed in
maps from concept sketches. The dataset contains approx- [29]. This representation is then introduced to a GCN to
imately 6, 000 paired concept sketches and normal maps. facilitate feature learning and, subsequently, a search method is
7 https://geometry.cs.ucl.ac.uk/projects/2022/free2cad/ 8 https://github.com/AutodeskAILab/Fusion360GalleryDataset/blob/master/docs/reconstru
18

Fig. 12: An illustration showcasing a conceptual sketch (left) designed for creating a 3D shape. Examples of these sketches
can be found in the CAD2Sketch dataset [28]. After refining the concept sketch, the final freehand sketch of the model is
achieved (right). Refined sketches like these are available in the Sketch2CAD dataset [24].

employed to deduce a sequence of CAD operations that can types one-sided, symmetric, two-sided and boolean operations
faithfully recreate this zone graph. However, none of these of types new body, join, cut, or intersect. These operations
two aforementioned methods leverage generative methods for are employed to integrate the modified shape with the pre-
CAD operation generation. In this subsection, the most recent viously constructed form. A CAD model M is represented
generative methods in this domain are introduced. as a sequence of curve commands which build the sketch,
1) DeepCAD [17]: The first generative deep learning intertwined with extrude commands. The total number of CAD
model proposed for creating 3D CAD commands is Deep- commands to represent each CAD model is fixed to 60 and
CAD. Given the sequential and irregular nature of CAD a padding approach with empty commands is used to fit the
operations and B-Rep data, it is essential to choose a fixed CAD models with shorter command sequence in this fixed-
set of the most commonly employed CAD operations and length structure. Each command undergoes encoding into a 16-
organize them into a unified structure for utilization by a dimensional vector representing the whole set of parameters
generative neural network. Drawing inspiration from earlier of all commands. In instances where specific parameters for a
generative techniques designed for 2D CAD analysis and given command are not applicable, they are uniformly set to
sketch generation such as CurveGen-TurtleGen, SketchGen, 1. As illustrated in Figure 13, the autoencoder network takes a
CAD as Language, and VITROVION outlined in Section sequence of CAD commands as input, transforms them into a
V-A, DeepCAD follows a similar approach by likening CAD latent space through Transformer encoder, and subsequently
operations to natural language. It introduces a generative decodes a latent vector to reconstruct a sequence of CAD
Transformer network for autoencoding CAD operations. It is commands. The generated CAD commands can be imported
worth noting that DeepCAD diverges from previous generative into CAD software for final user editing.
methods by adopting a feed-forward Transformer structure The performance of DeepCAD is evaluated on two tasks:
instead of the autoregressive Transformer commonly used CAD model autoencoding and random CAD model generation.
in such contexts. Additionally, DeepCAD has constructed Once the autoencoder network is trained for reconstructing
and released a large-scale dataset featuring 178, 238 CAD CAD commands, the latent-GAN technique [112] is employed
models from OnShape repository generated through Sketch to train a generator and a discriminator on the learned latent
and Extrude operations, accompanied by their respective CAD space. The generator produces a latent vector z by receiving as
construction sequences. This dataset surpasses the size of the input a random vector sampled from a multivariate Gaussian
Fusion 360 Gallery dataset, which contained approximately distribution. This latent vector can then be introduced to the
8, 000 designs. The substantial increase in the number of trained Transformer decoder to produce the CAD model com-
designs within the DeepCAD dataset enhances its suitability mands. Their experiments also demonstrate that the pretrained
for training generative networks. model on the DeepCAD dataset exhibits good generalization
In the standardized structure suggested for CAD operations capabilities when applied to the Fusion 360 dataset. Notably,
in DeepCAD, the CAD commands are explicitly detailed, these two datasets originate from distinct sources, namely
providing information on their type, parameters, and sequential OnShape and Autodesk Fusion 360.
index. The Sketch commands encompass curves of types The application of DeepCAD generative model is further
line, arc and circle along with their respective parameters. demonstrated in the conversion of 3D point cloud data into a
Meanwhile, Extrude commands signify extrusion operations of CAD model. In this context, both the generative autoencoder
19

Linear Embedding

Generated CAD Command


CAD Command Sequence

Transformer Decoder Z

Decoder
Encoder

Embedding

Sequence
Average Pooling

Transformer Encoder

Feed Forward Embedding

Fig. 13: Schematic representation of the DeepCAD model architecture [17]. The model, depicted on the left, is a transformer
autoencoder network trained in an unsupervised manner. It takes a sequence of CAD commands as input and reconstructs
them. The detailed structure of this autoencoder is shown on the right, where {C1 , C2 , ..., CN } tokens represent the input CAD
command sequence, and {C1 , C2 , ..., CN } are the reconstructed commands.

and the PointNet++ encoder [8] are trained concurrently to Inspired by CurveGen-TurtleGen [21] and DeepCAD [17]
encode the CAD model into the same latent vector z from the methods, a CAD model in SkexGen is represented by a
CAD commands sequence and its corresponding point cloud hierarchy of primitives with a sketch-and-extrude construction
data, respectively. During inference, the pretrained PointNet++ sequence. Within this hierarchy, a 3D model is composed
[8] encoder embeds the point cloud data into the latent vector of 3D solids, where each solid is defined as an extruded
z which is subsequently input to the generative decoder of sketch. A sketch constitutes a collection of faces, and each face
the pretrained DeepCAD autoencoder to produce the CAD represents a 2D surface enclosed by a loop. A loop is formed
commands sequence. by one or multiple curves, including lines, arcs or circles,
2) SkexGen [31]: Despite advancements achieved by Deep- and curve represents the fundamental level of the hierarchy.
CAD which excels in generating a diverse range of shapes, a Consequently, CAD models are encoded using five types
persistent challenge remains in the limited user control over of tokens for input to the Transformer: 1) topology tokens,
the generated designs. The ability for users to exert influence denoting the curve type, 2) geometry tokens, specifying 2D
over the output and tailor designs to meet specific requirements coordinates along the curves, 3) end of primitive tokens, 4) ex-
would be a great advantage for the real world applications. trusion tokens, indicating parameters of extrusion and Boolean
In response to this challenge, SkexGen proposed a novel au- operations, and 5) end of sequence tokens. The autoregressive
toregressive Transformer network with three separate encoders Transformer network introduced by SkexGen is composed
capturing the topological, geometric and extrusion variations of two independent branches, each trained separately: 1) the
in the CAD command sequences separately. This approach Sketch branch is composed of two distinct encoders dedi-
enables more effective and distinct user control over the cated to learning topological and geometrical variations in
topology and geometry of the model, facilitating exploration sketches. Additionally, a single decoder is employed, receiving
within a broader search space of related designs which in turn, concatenated topology and geometry encoded codebooks as
results in the generation of more realistic and diverse CAD input and predicting the sketch subsequence autoregressively.
models. 2) The “Extrude” branch comprises an encoder and decoder
20

specifically designed to learn variations in extrusion and SolidGen [15] introduces a method for direct B-Rep synthe-
Boolean operations. Furthermore, an additional autoregressive sis, eliminating the necessity for a history of CAD command
decoder is positioned at the end, tasked with learning an ef- sequences. The approach leverages pointer networks [107] and
fective combination of geometrical, topological, and extrusion autoregressive Transformer networks to learn B-Rep topology
codebooks, thereby generating CAD construction sequences. and progressively predict vertices, edges, and faces individ-
An additional autoregressive decoder on top learns an effec- ually. Running in parallel with SolidGen, the work by [113]
tive combination of geometrical, topological and extrusion is focused on 3D face identification within B-Rep data, given
codebooks as CAD construction sequences. This intricate a single 2D line drawing. Drawing inspiration from pointer
tokenization and multifaceted network architecture allows for networks, this approach also employs an autoregressive Trans-
a nuanced and comprehensive control over both topology former network to identify edge loops in the 2D line drawing,
and geometry, enabling the generation of CAD models by and predicts one co-edge index at a time, corresponding to the
effectively capturing various design aspects. actual planar and cylindrical faces in the 3D design. However,
Given capacity of SkexGen to independently encode and SolidGen stands out as a more advantageous method. It goes
generate Sketch and Extrude command sequences, this ap- beyond edge loop identification and synthesizes the complete
proach is versatile and can be applied to both 2D sketch B-Rep data for the 3D shape. Additionally, it supports the
generation and 3D CAD generation tasks. Evaluation results representation of all types of faces in the design, providing
highlight its proficiency in generating more intricate designs a more comprehensive and versatile solution. SolidGen also
compared to CurveGen-TurtleGen [21] and DeepCAD [17] introduces the Indexed Boundary Representation (Indexed B-
methods. Notably, SkexGen excels in supporting multi-step Rep) to represent B-Reps as numeric arrays suitable for neural
Sketch and Extrude sequences, a capability lacking in Deep- network use. This indexed B-Rep organizes B-Rep vertices,
CAD, which primarily produces single-step results. edges, and faces in a clearly defined hierarchy to capture both
geometric and topological relations. Structurally, it consists of
three lists, denoted as V, E, F, representing Vertices, Edges,
D. 3D CAD Generation with Direct B-Rep Synthesis and Faces, respectively. In such hierarchical structure, edges
1) SolidGen [15]: Creating 3D CAD designs through either E are denoted as list of indices referring to vertices V, and each
the generation of CAD commands or direct B-Rep synthesis face in F indicates an index list referring into edges E. The
comes with a set of advantages and disadvantages. The genera- proposed autoregressive network progressively predicts the B-
tive methods introduced in previous works, namely Fusion 360 Rep tokens through training three distinct Transformers for
Reconstruction, DeepCAD, and SkexGen, share a common generating vertices, edges, and faces, respectively. Formally,
goal of producing 3D CAD models by generating sequences its goal is to learn a joint distribution over B-Rep B:
of 3D CAD operations or commands. These commands are
then processed by a solid modeling kernel in a CAD tool to p(B) = p(V, E, F), (5)
recover the final CAD design in B-Rep format. Generating factorized as:
CAD commands, as opposed to directly creating the B-Rep,
offers several advantages. Converting CAD commands into p(B) = p(F | E, V)p(E | V)p(V). (6)
B-Rep format is feasible, while the reverse process is more
challenging due to the potential ambiguity where different This structure also allows for conditioning the distribution on
command sequences may result in the same B-Rep. Addition- an external context c, such as class labels, images, and voxels:
ally, CAD commands are more human-interpretable, enabling p(B) = p(F | E, V, c)p(E | V, c)p(V | c). (7)
users to edit designs for various applications by processing
the commands in a CAD tool. However, training models for Following training, the generation of an Indexed B-Rep
such tasks necessitates large-scale CAD datasets that retain the involves sampling vertices, edges conditioned on vertices, and
history of CAD modeling operations. Consequently, datasets faces conditioned on both edges and vertices. Subsequently,
like DeepCAD (comprising around 190,000 models) and Fu- the obtained Indexed B-Rep can be transformed into the
sion 360 Reconstruction (with approximately 8,000 models) actual B-Rep through a post-processing step. For a more
are exclusively constructed for this purpose. In contrast, most comprehensive understanding of these processes, interested
large-scale datasets in the field, such as ABC (with over 1 readers are encouraged to refer to the original paper [15]. The
million models), solely provide B-Rep data without a stored efficacy of this method is assessed using a refined version of
sequence of CAD modeling operations. While generating CAD the DeepCAD dataset. Additionally, SolidGen introduces the
commands offers increased flexibility, interoperability, and Parametric Variations (PVar) dataset, purposefully designed
user control, an alternative strategy is to directly synthesize the for evaluating the model’s performance in class-conditional
B-Rep. This approach may prove beneficial when leveraging generation tasks. This synthetic dataset comprises 120, 000
existing large-scale datasets. Moreover, direct B-Rep synthesis CAD models distributed across 60 classes.
allows for the support of more complex curves and surfaces in 2) CADParser [16]: In conjunction with SolidGen, CAD-
creating 3D shapes. This stands in contrast to CAD command Parser has been introduced to predict the sequence of CAD
generative methods, which are constrained to CAD models commands from a given B-Rep CAD model. Unlike previous
constructed using Sketch and Extrude commands with a lim- approaches that often utilized synthetic CAD datasets or relied
ited list of supported curve types. on datasets like DeepCAD and Fusion 360 reconstruction,
21

which were restricted to CAD models created with only


two operations, namely Sketch and Extrude, CADParser has
introduced a comprehensive dataset featuring 40, 000 CAD base
models. These models incorporate a broader range of CAD
operations, including Sketch, Extrusion, Revolution, Fillet, and
Chamfer. This dataset provides a more diverse collection of
CAD models constructed with five distinct types of CAD
operations compared to previous datasets limited to only two barrel
operations. Each CAD model in this dataset is accompanied by
both B-Rep data and the corresponding construction command
sequence.
CADParser also introduces a deep neural network architec-
ture, referred to as the deep parser, designed to predict the
CAD construction sequence for each B-Rep model. Drawing Fig. 14: An example of Extrusion Cylinder. The 2D sketch is
inspiration from UV-Net and BRepNet, both discussed in a circle here (right) which is extruded to make the Extrusion
Sec. IV as pioneering methods in representation learning Cylinder (left) [32].
for B-Rep data, CADParser treats each CAD model as a
graph G = (V, E) where the nodes represent faces, edges,
and coedges of the model, and E signifies the connections DeepCAD method. The ability to seamlessly convert diverse
between the graph nodes. The BRepNet architecture serves data representations into CAD models opens new avenues for
as the graph encoder backbone, taking node features and the real-world applications, bridging the gap between different
constructed adjacency matrix as input, and extracting local data modalities and expanding the utility of CAD technol-
and global features of the graph through graph convolutions ogy. This becomes especially critical in scenarios where new
and topological walks. Simultaneously, the sequence of CAD variations of a physical object are needed or when repairing
commands S = (C1 , C2 , ..., Cn ) that constructs the CAD a machinery object without access to the corresponding CAD
model is encoded as feature vectors. These are combined model. This can be particularly challenging in instances where
with the graph-encoded global and local features and fed the object predates the digital era of manufacturing. In these
into a Transformer decoder to autoregressively predict the scenarios, the process typically begins with scanning the object
next command sequence. Similar to previous methods, CAD using a 3D sensor, which generates a point cloud. Subse-
commands are tokenized by representing each command Ci quently, the acquired point cloud data needs to be decomposed
as a tuple of command type ti and command parameter pi . ti into a collection of geometric primitives, such as curves or
is encoded as a 12-dimensional one-hot vector, representing surfaces, to be interpreted by CAD software. The traditional
12 different command types one at a time, and pi is a three-step procedure [114] involves converting the point cloud
257-dimensional vector. This vector represents the quantized into a mesh, explaining it through parametric surfaces to
256-dimensional parameter vector of the command and a 1- create a solid (B-Rep), and inferring a CAD program. Recent
dimensional index indicating whether this command is used advancements in fitting primitives to point clouds [115]–
for the corresponding CAD model or not. The length of the [117] have managed to bypass the initial step of converting
CAD command sequence for all CAD models is fixed at 32 point clouds to meshes. Nevertheless, a notable limitation of
for simplicity, with unused commands for each CAD model these methods is their reliance on a finite set of fixed and
indicated by an index set to −1. The Transformer decoder disjoint primitives, which poses a challenge for convenient
has two separate output branches for predicting the CAD shape editing in subsequent steps. Addressing this, the recently
command type ti and parameter pi vectors, respectively. For proposed Point2Cyl [32] method frames the problem as an
more details on the model architecture and training process, Extrusion Cylinder decomposition task, leveraging a neural
additional information can be found in [16]. The contribution network to predict per-point extrusion instances, surface nor-
of this work represents a step forward in generating more di- mals, and base/barrel membership. These geometric proxies
verse CAD models by incorporating various CAD commands. can then be used to estimate the extrusion parameters through
differentiable and closed-form formulations. In this method, an
Extrusion Cylinder is considered a fundamental primitive that
E. 3D CAD Generation from Point Cloud signifies an extruded 2D sketch, characterized by parameters
As highlighted in the overview of existing 3D CAD gen- such as the extrusion axis, center, sketch, and sketch scale,
erative models, the reverse engineering of CAD shapes has which can be used to represent 3D CAD models. The terms
primarily been explored through methods utilizing either CAD base and barrel are utilized to denote specific surfaces of
sketches, B-Rep data or sequences of CAD commands as an extrusion cylinder, representing the base/top plane and the
input. These approaches have made promising advances so side surface of the extrusion cylinder, respectively. Figure 14
far in reconstructing and generating CAD models. However, an shows an example of Extrusion Cylinder which is obtained by
equally crucial aspect is generating CAD data from alternative extruding a circle as the base 2D sketch. Point2Cyl leverages
raw geometric data modalities, such as point clouds. This PointNet++ [8] to learn point cloud feature embeddings, which
consideration also aligns with the future work outlined by the are then passed into two distinct fully connected networks for
22

point cloud segmentation into extrusion cylinder, base/barrel Limited Annotated B-Rep Data: Despite the release of
and surface normal prediction. The approach is evaluated on several large-scale CAD datasets, which include B-Rep format
Fusion Gallery and DeepCAD datasets, outperforming base- alongside conventional 3D data formats in recent years, there
lines and showcasing its effectiveness in reconstruction and remains a considerable demand for annotated datasets for
shape editing. The code of this method is publicly available.9 supervised learning-based approaches. The datasets annotated
Nevertheless, it is important to note that this method is for CAD model classification are still limited in size, lacking
limited in its ability to handle cases where the input data the diversity and complexity needed for a comprehensive
is not noisy or distorted. The task of reconstructing the analysis of CAD models. While there are large-scale annotated
sharp edges and surfaces in prismatic 3D shapes from noisy datasets for common object CAD shapes, like ShapeNet [62],
point cloud data becomes challenging, given that point clouds these datasets are only available in mesh format. Consequently,
inherently offer only an approximate representation of the 3D transferring knowledge from common object datasets to me-
shape. This challenge is particularly pronounced when dealing chanical datasets is only viable when using mesh or point
with point clouds acquired through low-cost scanners, where cloud data, not B-Rep. On the other hand, annotating mechan-
any distortions or irregularities present in the shape may be ical CAD models is challenging and requires domain knowl-
addressed by smoothing during the surface reconstruction pro- edge expertise to recognize the type, functionality, or other
cess. Lambourne et al. [33] proposed a method concurrently properties of mechanical objects. Inconsistencies in expert
with Point2Cyl, addressing the reconstruction challenge of annotations may arise due to different terminologies used for
sharp prismatic shapes when provided with an approximate these objects in different industries. Similar challenges exist
rounded (smoothed) point cloud. This approach introduces for datasets annotated for CAD segmentation into different
a differentiable pipeline that reconstructs a target shape in faces. As shown in Table III, there are a few annotated datasets
terms of voxels while extracting geometric parameters. An in this regard with different types of annotation, such as how
autoencoder network is trained to process a signed distance different faces are manufactured or the CAD operation used
function represented as a voxel grid, obtained through methods to construct each face. These datasets lack shared classes in
converting dense point clouds into signed distance functions their surface types, making it challenging to transfer learned
[118], [119]. The encoder produces an embedding vector, and features from one dataset to another using transfer learning or
the decoders further decompose the shape into 2D profile domain generalization approaches. More extensive datasets,
images and 1D envelope arrays. During inference, CAD data like MFCAD++ extending the MFCAD dataset, are needed
is generated by searching a repository of 2D constrained in this domain, with different annotation types. Collecting
sketches, extruding them, and combine them via Boolean large-scale CAD datasets that provide various data formats,
operations to construct the final CAD model. Evaluation on including mesh, point cloud, and B-Rep, featuring a diverse
the ABC dataset demonstrates the method’s ability to better range of CAD models with various complexities, and annotat-
approximate target shapes compared to DeepCAD method. ing them for different tasks, such as CAD classification and
Concurrent to the two previous works, ComplexGen [34] segmentation, will significantly contribute to future research
has introduced a new approach, ComplexNet, for directly gen- and development in this field.
erating B-Rep data from point clouds. This approach reframes Analysis on Complex CAD Assemblies: Existing works for
the reconstruction task as a holistic detection of geometric analyzing CAD assemblies are focusing on how to connect
primitives and their interconnections, encapsulated within a simple parts (solids), predicting the joint coordinates and
chain complex structure. ComplexNet, as a neural network direction of the connection. A future research direction could
architecture, harnesses a sparse CNN for embedding point be considering how to assemble multiple solids hierarchicaly
cloud features and a tri-path Transformer decoder to produce to make a more complex CAD model, or how to connect
three distinct groups of geometric primitives and their mutual two sub-assemblies, each containing multiple parts (solids),
relationships defined as adjacency matrices. Subsequently, a to make a bigger assembly. However, the existing datasets are
global optimization step refines the predicted probabilistic not annotated for such applications, since the joints between
structure into a definite B-Rep chain complex, considering different parts are not explicitly specified in the B-Rep data
structural validity constraints and geometric refinements. Ex- by default. The AutoMate and Fusion 360 Assembly-Joint
tensive experiments on the ABC dataset demonstrate the effec- datasets are annotated specifically for their corresponding
tiveness of this approach in generating structurally complete methods [18], [19], respectively. Another open problem in
and accurate CAD B-Rep models. analyzing complex CAD assemblies is to segment each as-
sembly into its building block in different levels (either sub-
VI. D ISCUSSION AND F UTURE W ORKS assembly or part (solid)). Learning how to replace a part in an
Although Geometric Deep Learning (GDL) methods have assembly, while considering all its connections to other parts,
made remarkable progress in analyzing CAD models and its complexity, functionality, materials, etc, with another part
automating the design process at different levels, several in the repository would be a great advantage in automating
challenges remain in this field. In this section, some of these CAD models customization process. Annotating datasets for
challenges are discussed and the potential future research this goal is also necessary.
directions for tackling them are proposed. Representation Learning on B-Rep Data: The initial and the
most important step in training deep learning models on B-Rep
9 point2cyl.github.io data involves creating numerical feature representations in a
23

format suitable for deep learning architectures. Many methods makes training supervised learning methods currently infeasi-
for CAD classification and segmentation, particularly those ble. For example, the sketches of SketchGraphs and 3D models
based on UV-Net, use UV-grid sampling. However, UV-grid in ABC datasets are both collected from Onshape repository.
features lack permutation invariance. In other words, when However, the 2D and 3D models in these two datasets are
the CAD solid is represented as a face adjacency graph, not paired. Collecting such a paired dataset would greatly aid
the arrangement of nodes in the graph is crucial for the future research.
deep learning model to recognize the graph structure and Reproducibility: In this domain, a significant challenge lies
its similarity to other graphs (or CAD solids), while two in reproducing and comparing the experimental results of
graphs with different node arrangements might still be similar different methodologies. The absence of large-scale anno-
together. Hence, exploring various invariances of the B-Rep tated benchmark datasets leads each method proposed for
graph and alternative methods of representing its data in a machine learning-based CAD analysis to either introduce a
structured format, while considering the relative orientation of new annotated CAD dataset tailored for the specific task
the solid faces and edges, holds promise for future research. or modify and annotate a portion of a large-scale dataset
Unsupervised and Self-Supervised Methods: Given the for evaluation. Supervised methods trained on these smaller
shortage of annotated data for supervised learning in CAD, datasets often exhibit high performance, leaving little room for
there is significant potential to enhance self-supervised and/or improvement. Moreover, since each method is benchmarked
unsupervised methods, leveraging large-scale datasets like on a dataset adapted for its specific task, comparing results
ABC. For instance, UV-Net utilized Graph Contrastive Learn- becomes complex. For instance, when analyzing the results
ing (GCL) for self-supervised learning, and [13] explored of different methods evaluated on a subset of the Fusion
shape rasterization to get the self-supervision from data for 360 segmentation dataset in [13], it is unclear which part
training an autoencoder and use the decoder for other super- of the dataset each method used for evaluation. Reproducing
vised tasks. Investigating diverse forms of self-supervision in the results of different methods and conducting performance
CAD data for training autoencoders or graph autoencoders comparisons become especially challenging when researchers
would be intriguing. Examining and assessing different graph do not release their preprocessed data or do not provide
transformations in GCL approaches to understand the structure clear preprocessing instructions. Another hurdle in replicating
of B-Rep graph locally and globally, maintaining various research results is the reliance of codes on CAD kernels
invariances, is a crucial avenue for future research. Addition- that are not open-source. For instance, replicating the results
ally, focusing on Variational Autoencoders for direct synthesis of AutoMate [19] requires installing the Parasolid kernel,
on B-Rep data, generating meaningful new shapes from the which is typically accessible only to industrial developers and
same distribution as existing shapes, would enhance similarity companies. This limited availability makes it challenging for
analysis and CAD retrieval. independent academic researchers to utilize and build upon
CAD Generation and B-Rep Synthesis: The current genera- such research works. While codes dependent on the open-
tive methods for CAD commands often focus on a limited set source OpenCASCADE kernel are more accessible, many
of operations, like Sketch and Extrude, restricting the complex- methods in this field either use constraint solvers through CAD
ity and diversity of the resulting CAD models. Notably, there tools during training or are in a way reliant on a CAD tool
is no assurance in certain methods, such as DeepCAD [17], kernel, necessitating access and licensing for at least one CAD
that all generated CAD command sequences will yield topo- software. To address these issues, it is highly recommended
logically valid CAD models, particularly in cases of complex that researchers provide comprehensive documentation for
models with long command sequences. Therefore, an avenue their released codes, detailing the data preprocessing setup
for future exploration involves expanding generative methods and offering sufficient information on the experimental setting
to encompass a broader range of CAD operations, such as fillet and code dependencies. This transparency can significantly
and chamfer, allowing the generation of command sequences facilitate future research efforts in the field.
for more complex CAD shapes. Additionally, advancements
in methods like SolidGen [15] and CADParser [16], which VII. C ONCLUSION
directly focus on synthesizing B-Rep data without relying Developing and training Geometric Deep Learning (GDL)
on CAD command sequences, mark a promising direction. models to learn and reason about CAD designs holds the
Despite these advancements, there is still room for creativity promise of revolutionizing design workflows, bringing in
and improvement in this area. Another direction for future greater efficiency. However, extending machine learning-based
research involves extending generative methods to produce methods to complex parametric data, like B-Rep, poses a
CAD construction operations or B-Rep synthesis from other crucial and challenging task. This survey offers a comprehen-
3D data formats, like mesh, voxel and point cloud. Recent sive review of GDL methods tailored for CAD data analysis.
progress in converting point clouds to CAD models, described It presents detailed explanation, comparisons, and summaries
in Section V-E, opens possibilities for transferring knowledge within two primary categories: 1) CAD representation learn-
across different data domains. Moreover, generating 3D CAD ing, encompassing supervised and self-supervised methods
models from noisy scans or hand-drawn sketches is in high designed for CAD classification, segmentation, assembly, and
demand in CAD tools and presents significant potential for retrieval, and 2) CAD generation, involving generative meth-
improvement. Although some methods have been introduced ods for 2D and 3D CAD construction. Additionally, bench-
in this domain, the lack of a paired sketch and B-Rep dataset mark datasets and open-source codes are introduced. The
24

survey concludes by discussing the challenges inherent in this [20] A. Seff, Y. Ovadia, W. Zhou, and R. P. Adams, “Sketchgraphs: A
rapidly evolving field and proposes potential avenues for future large-scale dataset for modeling relational geometry in computer-aided
design,” arXiv preprint arXiv:2007.08506, 2020.
research. [21] K. D. Willis, P. K. Jayaraman, J. G. Lambourne, H. Chu, and Y. Pu,
“Engineering sketch generation for computer-aided design,” in IEEE
ACKNOWLEDGEMENT Conference on Computer Vision and Pattern Recognition, 2021.
[22] C. Li, H. Pan, A. Bousseau, and N. J. Mitra, “Free2cad: Parsing free-
The research leading to the results of this paper received hand drawings into cad commands,” ACM Transactions on Graphics
funding from the Thomas B. Thriges Foundation and the (TOG), vol. 41, no. 4, pp. 1–16, 2022.
[23] A. Seff, W. Zhou, N. Richardson, and R. P. Adams, “Vitruvion:
Industriens Foundation as part of the AI Supported Modular A generative model of parametric CAD sketches,” in International
Design and Implementation project. Conference on Learning Representations, ICLR, 2022.
[24] C. Li, H. Pan, A. Bousseau, and N. J. Mitra, “Sketch2cad: Sequential
R EFERENCES cad modeling by sketching in context,” ACM Transactions on Graphics
(TOG), vol. 39, no. 6, pp. 1–14, 2020.
[1] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric [25] E. Dupont, K. Cherenkova, A. Kacem, S. A. Ali, I. Arzhannikov,
deep learning: Grids, groups, graphs, geodesics, and gauges,” arXiv G. Gusev, and D. Aouada, “Cadops-net: Jointly learning cad opera-
preprint arXiv:2104.13478, 2021. tion types and steps from boundary-representations,” in International
[2] E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri, “3d shape Conference on 3D Vision (3DV). IEEE, 2022.
segmentation with projective convolutional networks,” in IEEE confer- [26] W. Para, S. Bhat, P. Guerrero, T. Kelly, N. Mitra, L. J. Guibas, and
ence on computer vision and pattern recognition, 2017. P. Wonka, “Sketchgen: Generating constrained cad sketches,” Advances
[3] Y. Feng, Y. Feng, H. You, X. Zhao, and Y. Gao, “Meshnet: Mesh neural in Neural Information Processing Systems, vol. 34, pp. 5077–5088,
network for 3d shape representation,” in AAAI conference on artificial 2021.
intelligence, 2019. [27] Y. Ganin, S. Bartunov, Y. Li, E. Keller, and S. Saliceti, “Computer-
[4] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural aided design as language,” Advances in Neural Information Processing
network for real-time object recognition,” in IEEE International Con- Systems, vol. 34, pp. 5885–5897, 2021.
ference on Intelligent Robots and Systems (IROS), 2015. [28] F. Hähnlein, C. Li, N. J. Mitra, and A. Bousseau, “Cad2sketch:
[5] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, Generating concept sketches from cad sequences,” ACM Transactions
“3d shapenets: A deep representation for volumetric shapes,” in IEEE on Graphics (TOG), vol. 41, no. 6, pp. 1–18, 2022.
conference on computer vision and pattern recognition, 2015. [29] X. Xu, W. Peng, C.-Y. Cheng, K. D. Willis, and D. Ritchie, “Inferring
[6] C. Wang, M. Cheng, F. Sohel, M. Bennamoun, and J. Li, “Normalnet: cad modeling sequences using zone graphs,” in IEEE conference on
A voxel-based cnn for 3d object classification and retrieval,” Neuro- computer vision and pattern recognition, 2021.
computing, vol. 323, pp. 139–147, 2019. [30] K. D. Willis, Y. Pu, J. Luo, H. Chu, T. Du, J. G. Lambourne,
[7] T. Le and Y. Duan, “Pointgrid: A deep network for 3d shape un- A. Solar-Lezama, and W. Matusik, “Fusion 360 gallery: A dataset and
derstanding,” in IEEE conference on computer vision and pattern environment for programmatic cad construction from human design
recognition, 2018. sequences,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp.
[8] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on 1–24, 2021.
point sets for 3d classification and segmentation,” in IEEE Conference [31] X. Xu, K. D. D. Willis, J. G. Lambourne, C. Cheng, P. K. Jayara-
on Computer Vision and Pattern Recognition, 2017. man, and Y. Furukawa, “Skexgen: Autoregressive generation of CAD
[9] P. K. Jayaraman, A. Sanghi, J. G. Lambourne, K. D. Willis, T. Davies, construction sequences with disentangled codebooks,” in International
H. Shayani, and N. Morris, “Uv-net: Learning from boundary rep- Conference on Machine Learning, 2022.
resentations,” in IEEE Conference on Computer Vision and Pattern [32] M. A. Uy, Y.-Y. Chang, M. Sung, P. Goel, J. G. Lambourne, T. Birdal,
Recognition, 2021. and L. J. Guibas, “Point2cyl: Reverse engineering 3d objects from point
[10] C. Krahe, A. Bräunche, A. Jacob, N. Stricker, and G. Lanza, “Deep clouds to extrusion cylinders,” in IEEE Conference on Computer Vision
learning for automated product design,” CIRP Design Conference, and Pattern Recognition, 2022.
2020. [33] J. G. Lambourne, K. Willis, P. K. Jayaraman, L. Zhang, A. Sanghi, and
[11] C. Krahe, M. Marinov, T. Schmutz, Y. Hermann, M. Bonny, M. May, K. R. Malekshan, “Reconstructing editable prismatic cad from rounded
and G. Lanza, “Ai based geometric similarity search supporting com- voxel models,” in SIGGRAPH Asia 2022 Conference Papers, 2022.
ponent reuse in engineering design,” CIRP Design Conference, 2022. [34] H. Guo, S. Liu, H. Pan, Y. Liu, X. Tong, and B. Guo, “Complexgen:
[12] D. Machalica and M. Matyjewski, “Cad models clustering with ma- Cad reconstruction by b-rep chain complex generation,” ACM Trans-
chine learning,” Archive of Mechanical Engineering, vol. 66, no. 2, actions on Graphics (TOG), vol. 41, no. 4, pp. 1–18, 2022.
2019. [35] M. Groover and E. Zimmers, CAD/CAM: computer-aided design and
[13] B. T. Jones, M. Hu, M. Kodnongbua, V. G. Kim, and A. Schulz, “Self- manufacturing. Pearson Education, 1983.
supervised representation learning for cad,” in IEEE Conference on [36] Autodesk, “Autocad,” 1982. [Online]. Available: https://www.autodesk.
Computer Vision and Pattern Recognition, 2023. com/products/autocad/overview?term=1-YEAR&tab=subscription
[14] J. G. Lambourne, K. D. Willis, P. K. Jayaraman, A. Sanghi, P. Meltzer, [37] ——, “Fusion 360,” 2013. [Online]. Available: https://www.autodesk.
and H. Shayani, “Brepnet: A topological message passing system for com/products/fusion-360/overview?term=1-YEAR&tab=subscription
solid models,” in IEEE Conference on Computer Vision and Pattern [38] Dassault Systèmes, “Solidworks,” 1995. [Online]. Available: https:
Recognition, 2021. //www.3ds.com/products/solidworks
[15] P. K. Jayaraman, J. G. Lambourne, N. Desai, K. D. D. Willis, [39] ——, “Catia,” 1977. [Online]. Available: https://www.3ds.com/
A. Sanghi, and N. J. W. Morris, “Solidgen: An autoregressive model for products-services/catia/
direct b-rep synthesis,” Transactions on Machine Learning Research, [40] PTC, “Onshape,” 2015. [Online]. Available: https://www.onshape.com/
2023. en/
[16] S. Zhou, T. Tang, and B. Zhou, “Cadparser: a learning approach of [41] ——, “Creo,” 2011. [Online]. Available: https://www.ptc.com/en/
sequence modeling for b-rep cad,” in International Joint Conference products/creo
on Artificial Intelligence, 2023. [42] Kai Backman, Mikko Mononen, “Tinkercad,” 2011. [Online].
[17] R. Wu, C. Xiao, and C. Zheng, “Deepcad: A deep generative network Available: https://www.tinkercad.com
for computer-aided design models,” in IEEE International Conference [43] Thomas Paviot, “pythonocc-core,” 2022. [Online]. Available: https:
on Computer Vision, 2021. //doi.org/10.5281/zenodo.3605364
[18] K. D. Willis, P. K. Jayaraman, H. Chu, Y. Tian, Y. Li, D. Grandi, [44] “Opencascade technology (occt),” 2002. [Online]. Available: https:
A. Sanghi, L. Tran, J. G. Lambourne, A. Solar-Lezama et al., “Joinable: //github.com/AutodeskAILab/occwl
Learning bottom-up assembly of parametric cad joints,” in IEEE [45] Pradeep Kumar Jayaraman, Joseph Lambourne, “Occwl,” 2023.
Conference on Computer Vision and Pattern Recognition, 2022. [Online]. Available: https://dev.opencascade.org
[19] B. Jones, D. Hildreth, D. Chen, I. Baran, V. G. Kim, and A. Schulz, [46] “Cadquery,” 2019. [Online]. Available: https://github.com/CadQuery/
“Automate: A dataset and learning approach for automatic mating of cadquery/tree/master
cad assemblies,” ACM Transactions on Graphics (TOG), vol. 40, no. 6, [47] “Parasolid,” 1980. [Online]. Available: https://www.plm.automation.
pp. 1–18, 2021. siemens.com/global/en/products/plm-components/parasolid.html
25

[48] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, [71] D. Bespalov, C. Y. Ip, W. C. Regli, and J. Shaffer, “Benchmarking cad
“The graph neural network model,” IEEE transactions on neural search techniques,” in ACM symposium on Solid and physical modeling,
networks, vol. 20, no. 1, pp. 61–80, 2008. 2005.
[49] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A [72] Z. Zhang, P. Jaiswal, and R. Rai, “Featurenet: Machining feature
comprehensive survey on graph neural networks,” IEEE transactions recognition based on 3d convolution neural network,” Computer-Aided
on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, Design, vol. 101, pp. 12–22, 2018.
2020. [73] B. Manda, P. Bhaskare, and R. Muthuganapathy, “A convolutional
[50] N. Heidari and A. Iosifidis, “Temporal attention-augmented graph neural network approach to the classification of engineering models,”
convolutional network for efficient skeleton-based human action recog- IEEE Access, vol. 9, pp. 22 711–22 723, 2021.
nition,” in International Conference on Pattern Recognition (ICPR). [74] A. Angrish, B. Craver, and B. Starly, ““fabsearch”: A 3d cad model-
IEEE, 2021. based search engine for sourcing manufacturing services,” Journal of
[51] ——, “Progressive spatio-temporal graph convolutional network for Computing and Information Science in Engineering, vol. 19, no. 4, p.
skeleton-based human action recognition,” in IEEE International Con- 041006, 2019.
ference on Acoustics, Speech and Signal Processing (ICASSP), 2021. [75] L. Mandelli and S. Berretti, “Cad 3d model classification by graph
[52] L. Hedegaard, N. Heidari, and A. Iosifidis, “Continual spatio-temporal neural networks: A new approach based on step format,” arXiv preprint
graph convolutional networks,” Pattern Recognition, vol. 140, p. arXiv:2210.16815, 2022.
109528, 2023. [76] W. Cao, T. Robinson, Y. Hua, F. Boussuge, A. R. Colligan, and
[53] E. Mansimov, O. Mahmood, S. Kang, and K. Cho, “Molecular geome- W. Pan, “Graph representation of 3d cad models for machining feature
try prediction using a deep generative graph neural network,” Scientific recognition with deep learning,” in International Design Engineering
reports, vol. 9, no. 1, p. 20381, 2019. Technical Conferences and Computers and Information in Engineering
[54] Y. Wang, J. Wang, Z. Cao, and A. Barati Farimani, “Molecular Conference, 2020.
contrastive learning of representations via graph neural networks,” [77] S. Koch, A. Matveev, Z. Jiang, F. Williams, A. Artemov, E. Burnaev,
Nature Machine Intelligence, vol. 4, no. 3, pp. 279–287, 2022. M. Alexa, D. Zorin, and D. Panozzo, “Abc: A big cad model dataset
[55] K. Baltakys, M. Baltakienė, N. Heidari, A. Iosifidis, and J. Kanniainen, for geometric deep learning,” in IEEE Conference on Computer Vision
“Predicting the trading behavior of socially connected investors: Graph and Pattern Recognition, 2019.
neural network approach with implications to market surveillance,” [78] M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?”
Expert Systems with Applications, vol. 228, p. 120285, 2023. ACM Transactions on graphics (TOG), vol. 31, no. 4, pp. 1–10, 2012.
[56] L. Yang, J. Zhuang, H. Fu, X. Wei, K. Zhou, and Y. Zheng, “Sketchgnn: [79] P. Sangkloy, N. Burnell, C. Ham, and J. Hays, “The sketchy database:
Semantic sketch segmentation with graph neural networks,” ACM learning to retrieve badly drawn bunnies,” ACM Transactions on
Transactions on Graphics (TOG), vol. 40, no. 3, pp. 1–13, 2021. Graphics (TOG), vol. 35, no. 4, pp. 1–12, 2016.
[57] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [80] Y. Gryaditskaya, M. Sypesteyn, J. W. Hoftijzer, S. C. Pont, F. Durand,
convolutional networks,” in International Conference on Learning and A. Bousseau, “Opensketch: a richly-annotated dataset of product
Representations, 2017. design sketches.” ACM Transactions on Graphics (TOG), vol. 38, no. 6,
pp. 232–1, 2019.
[58] N. Heidari and A. Iosifidis, “Progressive graph convolutional networks
[81] A. R. Colligan, T. T. Robinson, D. C. Nolan, Y. Hua, and W. Cao,
for semi-supervised node classification,” IEEE Access, vol. 9, pp.
“Hierarchical cadnet: Learning from b-reps for machining feature
81 957–81 968, 2021.
recognition,” Computer-Aided Design, vol. 147, p. 103226, 2022.
[59] N. Heidari, L. Hedegaard, and A. Iosifidis, “Graph convolutional
[82] T. G. Gunn, “The mechanization of design and manufacturing,” Scien-
networks,” in Deep Learning for Robot Perception and Cognition.
tific American, vol. 247, no. 3, pp. 114–131, 1982.
Elsevier, 2022, pp. 71–99.
[83] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph con-
[60] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning
trastive learning with augmentations,” Advances in neural information
deep generative models of graphs,” arXiv preprint arXiv:1803.03324,
processing systems, vol. 33, pp. 5812–5823, 2020.
2018.
[84] M. Al-Wswasi, A. Ivanov, and H. Makatsoris, “A survey on smart
[61] P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser, “The princeton automated computer-aided process planning (acapp) techniques,” The
shape benchmark,” in Proceedings Shape Modeling Applications, 2004. International Journal of Advanced Manufacturing Technology, vol. 97,
IEEE, 2004, pp. 167–178. pp. 809–832, 2018.
[62] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, [85] Y. Shi, Y. Zhang, K. Xia, and R. Harik, “A critical review of feature
Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: recognition techniques,” Computer-Aided Design and Applications,
An information-rich 3d model repository,” arXiv preprint vol. 17, no. 5, pp. 861–899, 2020.
arXiv:1512.03012, 2015. [86] G. Zhan, Q. Fan, K. Mo, L. Shao, B. Chen, L. J. Guibas, H. Dong et al.,
[63] Q. Zhou and A. Jacobson, “Thingi10k: A dataset of 10,000 3d-printing “Generative 3d part assembly via dynamic graph learning,” Advances
models,” arXiv preprint arXiv:1605.04797, 2016. in Neural Information Processing Systems, vol. 33, pp. 6315–6326,
[64] K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, 2020.
“Partnet: A large-scale benchmark for fine-grained and hierarchical [87] H. Lin, M. Averkiou, E. Kalogerakis, B. Kovacs, S. Ranade, V. Kim,
part-level 3d object understanding,” in IEEE Conference on Computer S. Chaudhuri, and K. Bala, “Learning material-aware local descriptors
Vision and Pattern Recognition, 2019. for 3d shapes,” in International Conference on 3D Vision (3DV). IEEE,
[65] F. Bogo, J. Romero, G. Pons-Moll, and M. J. Black, “Dynamic faust: 2018.
Registering human bodies in motion,” in IEEE conference on computer [88] K. Mo, P. Guerrero, L. Yi, H. Su, P. Wonka, N. Mitra, and L. J. Guibas,
vision and pattern recognition, 2017, pp. 6233–6242. “Structurenet: hierarchical graph networks for 3d shape generation,”
[66] M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic, ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 242:1–
“Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large 242:19, 2019.
dataset of cad models,” in IEEE conference on computer vision and [89] R. K. Jones, T. Barton, X. Xu, K. Wang, E. Jiang, P. Guerrero, N. J.
pattern recognition, 2014, pp. 3762–3769. Mitra, and D. Ritchie, “Shapeassembly: Learning to generate programs
[67] S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene for 3d shape structure synthesis,” ACM Transactions on Graphics
understanding benchmark suite,” in IEEE conference on computer (TOG), vol. 39, no. 6, pp. 1–20, 2020.
vision and pattern recognition, 2015. [90] Z. Wu, X. Wang, D. Lin, D. Lischinski, D. Cohen-Or, and H. Huang,
[68] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and “Sagnet: Structure-aware generative network for 3d-shape modeling,”
M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–14, 2019.
scenes,” in IEEE conference on computer vision and pattern recogni- [91] K. Yin, Z. Chen, S. Chaudhuri, M. Fisher, V. G. Kim, and H. Zhang,
tion, 2017. “Coalesce: Component assembly by learning to synthesize connec-
[69] S. Jayanti, Y. Kalyanaraman, N. Iyer, and K. Ramani, “Developing tions,” in International Conference on 3D Vision (3DV). IEEE, 2020.
an engineering shape benchmark for cad models,” Computer-Aided [92] C. Zhu, K. Xu, S. Chaudhuri, R. Yi, and H. Zhang, “Scores: Shape
Design, vol. 38, no. 9, pp. 939–953, 2006. composition with recursive substructure priors,” ACM Transactions on
[70] S. Kim, H.-g. Chi, X. Hu, Q. Huang, and K. Ramani, “A large-scale Graphics (TOG), vol. 37, no. 6, pp. 1–14, 2018.
annotated mechanical components benchmark for classification and [93] A. N. Harish, R. Nagar, and S. Raman, “Rgl-net: A recurrent graph
retrieval tasks with deep neural networks,” in European Conference learning framework for progressive part assembly,” in IEEE Winter
on Computer Vision. Springer, 2020. Conference on Applications of Computer Vision (WACV), 2022.
26

[94] Y. Lee, E. S. Hu, and J. J. Lim, “Ikea furniture assembly environment [118] J. A. Bærentzen, “Robust generation of signed distance fields from
for long-horizon complex manipulation tasks,” in IEEE International triangle meshes,” in International Workshop on Volume Graphics.
Conference on Robotics and Automation (icra). IEEE, 2021. IEEE, 2005.
[95] M. Sung, H. Su, V. G. Kim, S. Chaudhuri, and L. Guibas, “Comple- [119] M. Sanchez, O. Fryazinov, and A. Pasko, “Efficient evaluation of
mentme: Weakly-supervised component suggestions for 3d modeling,” continuous signed distance to a polygonal mesh,” in Spring Conference
ACM Transactions on Graphics (TOG), vol. 36, no. 6, pp. 1–12, 2017. on Computer Graphics, 2012.
[96] X. Wang, B. Zhou, Y. Shi, X. Chen, Q. Zhao, and K. Xu,
“Shape2motion: Joint analysis of motion parts and attributes from
3d shapes,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2019.
[97] A. Zhao, J. Xu, M. Konaković-Luković, J. Hughes, A. Spielberg,
D. Rus, and W. Matusik, “Robogrammar: graph grammar for terrain-
optimized robot design,” ACM Transactions on Graphics (TOG),
vol. 39, no. 6, pp. 1–16, 2020.
[98] F. Boussuge, C. M. Tierney, T. T. Robinson, and C. G. Armstrong,
“Application of tensor factorisation to analyze similarities in cad
assembly models,” in International Meshing Roundtable and User
Forum, vol. 1, 2019.
[99] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.
Solomon, “Dynamic graph cnn for learning on point clouds,” ACM
Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019.
[100] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra, “Pcpnet
learning local shape properties from raw point clouds,” in Computer
graphics forum, vol. 37, no. 2. Wiley Online Library, 2018, pp. 75–85.
[101] S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention
networks?” in International Conference on Learning Representations,
ICLR, 2022.
[102] T. Funkhouser, M. Kazhdan, P. Shilane, P. Min, W. Kiefer, A. Tal,
S. Rusinkiewicz, and D. Dobkin, “Modeling by example,” ACM Trans-
actions on Graphics (TOG), vol. 23, no. 3, pp. 652–663, 2004.
[103] D. Ha and D. Eck, “A neural representation of sketch drawings,” in
International Conference on Learning Representations, ICLR, 2018.
[104] J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg, “The
quick, draw!-ai experiment,” Mount View, CA, accessed Feb, vol. 17,
no. 2018, p. 4, 2016.
[105] M. Zhang and Y. Chen, “Link prediction based on graph neural
networks,” Advances in neural information processing systems, vol. 31,
2018.
[106] C. Nash, Y. Ganin, S. A. Eslami, and P. Battaglia, “Polygen: An autore-
gressive generative model of 3d meshes,” in International Conference
on Machine Learning. PMLR, 2020.
[107] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Advances
in neural information processing systems, vol. 28, 2015.
[108] K. Varda, “Protocol buffers: Google’s data interchange format,” Google
Open Source Blog, Available at least as early as Jul, vol. 72, p. 23,
2008.
[109] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans-
formers for image recognition at scale,” in International Conference on
Learning Representations, ICLR, 2021.
[110] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel,
A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs
for learning molecular fingerprints,” Advances in Neural Information
Processing Systems, vol. 28, 2015.
[111] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
“Neural message passing for quantum chemistry,” in International
Conference on Machine Learning. PMLR, 2017.
[112] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, “Learning
representations and generative models for 3d point clouds,” in Inter-
national conference on machine learning. PMLR, 2018.
[113] K. Wang, J. Zheng, and Z. Zhou, “Neural face identification in a 2d
wireframe projection of a manifold object,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2022.
[114] P. Benkő and T. Várady, “Segmentation methods for smooth point
regions of conventional engineering objects,” Computer-Aided Design,
vol. 36, no. 6, pp. 511–523, 2004.
[115] T. Birdal, B. Busam, N. Navab, S. Ilic, and P. Sturm, “Generic primitive
detection in point clouds using novel minimal quadric fits,” IEEE
transactions on pattern analysis and machine intelligence, vol. 42,
no. 6, pp. 1333–1347, 2019.
[116] L. Li, M. Sung, A. Dubrovina, L. Yi, and L. J. Guibas, “Supervised
fitting of geometric primitives to 3d point clouds,” in IEEE Conference
on Computer Vision and Pattern Recognition, 2019.
[117] C. Sommer, Y. Sun, E. Bylow, and D. Cremers, “Primitect: Fast
continuous hough voting for primitive detection,” in International
Conference on Robotics and Automation (ICRA). IEEE, 2020.

You might also like