0% found this document useful (0 votes)

30 views8 pages

A Large-Scale Part-Centric Dataset For Material-Agnostic Articulated Object Manipulation.18276v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views8 pages

A Large-Scale Part-Centric Dataset For Material-Agnostic Articulated Object Manipulation.18276v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

GAPartManip: A Large-scale Part-centric Dataset for

Material-Agnostic Articulated Object Manipulation

Wenbo Cui* 1,2 , Chengyang Zhao* 3,4 , Songlin Wei* 3,6 , Jiazhao Zhang3,6 , Haoran Geng3,5 ,
Yaran Chen1 , He Wang† 2,3,6

Abstract— Effectively manipulating articulated objects in

arXiv:2411.18276v1 [cs.RO] 27 Nov 2024

household scenarios is a crucial step toward achieving general

embodied artificial intelligence. Mainstream research in 3D Grasping
Poses Box Refrigerator Microwave Laptop
vision has primarily focused on manipulation through depth
perception and pose detection. However, in real-world environ- Observations
ments, these methods often face challenges due to imperfect
depth perception, such as with transparent lids and reflective
handles. Moreover, they generally lack the diversity in part- Door Cabinet WashingMachine SuitCase

…
based interactions required for flexible and adaptable manip-
ulation. To address these challenges, we introduced a large-
scale part-centric dataset for articulated object manipulation
that features both photo-realistic material randomizations and Left/Right IR RGB Depth Mask Table Toilet Washing Machine
detailed annotations of part-oriented, scene-level actionable RGB-D Active Stereo Camera Part-level Grasping Poses
interaction poses. We evaluated the effectiveness of our dataset (240K images) (8B actionable poses)
by integrating it with several state-of-the-art methods for depth
estimation and interaction pose prediction. Additionally, we
proposed a novel modular framework that delivers superior
and robust performance for generalizable articulated object
manipulation. Our extensive experiments demonstrate that
our dataset significantly improves the performance of depth
perception and actionable interaction pose prediction in both
simulation and real-world scenarios.

I. I NTRODUCTION Sim-to-Real Transfer

Articulated objects are ubiquitous in people’s daily lives, Fig. 1. GAPartManip. We introduce a large-scale part-centric dataset
ranging from tabletop items like microwaves and kitchen for material-agnostic articulated object manipulation. It encompasses 19
pots to larger items like cabinets and washing machines. Un- common household articulated categories, totaling 918 object instances,
240k photo-realistic rendering images, and 8 billion scene-level actionable
like simple, single-function rigid objects, articulated objects interaction poses. GAPartManip enables robust zero-shot sim-to-real transfer
consist of multiple parts with different functions, featuring for accomplishing articulated object manipulation tasks.
varied geometric shapes and kinematic structures, making
generalizable perception and manipulation towards them Firstly, the material of articulated objects significantly
highly non-trivial [1]. Some existing works tried to simplify impacts the quality of point cloud data. Most existing work
this problem by developing intermediate representations to relies on point clouds, and these methods struggle due to the
encode the similarities across different objects implicitly, sim-to-real gap of depth estimation [9], [10], [12], [13]. Some
such as affordance [2]–[5] and motion flow [6]–[8], thereby neural-based stereo-matching depth reconstruction methods
achieving generalization across objects. Another series of are proposed and show some success on rigid objects [14],
work [9]–[11] tried to tackle the articulated object perception [15]. These methods use neural networks to encode the dis-
and manipulation based on a more explicit and fundamental parity in stereo infrared (IR) patterns projected by structured
concept called Generalizable and Actionable Part (GAPart), light cameras. However, due to the limited diversity in the
demonstrating more manipulation capabilities attributed to stereo IR dataset, these methods are constrained to small
its 7-DoF pose representation compared to value map rep- rigid objects and perform poorly on large articulated objects.
resentation of visual affordance. However, we observe that
Secondly, there is no method that can predict stable and
two critical limitations impede their real-world performance.
actionable interactive poses across categories for articulated
*Equal Contribution. objects. Some work employs heuristic-based methods [9] to
1 Institute of Automation, Chinese Academy of Sciences, 2 Beijing interact with articulated objects, but it is limited in diversity
Academy of Artificial Intelligence, 3 CFCS, School of Computer Science, and fails to account for the geometric details necessary for
Peking University, 4 Carnegie Mellon University, 5 University of California,
Berkeley, 6 Galbot, robust interactions in real-world settings [3]. Some methods
†Corresponding to [email protected]. for rigid objects grasping pose prediction can generate stable
poses. However, due to the lack of data on articulated on rigid objects, neglecting the kinematic semantics specific
objects, it is challenging to discern whether each link can to articulated objects. Where2act [2] first introduces a data
interact independently, resulting in poses that are mostly non- generation pipeline for articulated objects, and they generate
actionable [16]. Affordance-based methods [2], [13], [17] data by sampling successful poses in the simulator. AO-
receive widespread attention for interacting with articulated Grasp [16] leverages a curvature-based sampling method to
objects by generating heatmaps. However, these heatmaps are accelerate data collection efficiency and proposes an 87k
ambiguous, hard to annotate and struggle to produce stable dataset of actionable poses. RPMart [12] manually anno-
grasping interactive poses [12]. tated affordance maps for articulated objects and provided
In this paper, we address these limitations from a data- rendered data in SAPIEN [1]. None of the current datasets
centric perspective. We introduce GAPartManip, a novel provide sufficient photo-realistic rendering data to improve
large-scale synthetic dataset that features two important algorithms’ perception of articulated objects during sim2real,
aspects: (1) realistic, physics-based IR image rendering of limiting real-world performance, especially with imperfect
various parts in diverse scenes, and (2) part-oriented ac- point clouds [10], [12]. Additionally, the data collection pro-
tionable interaction pose annotations for a wide range of cesses are inefficient and result in small datasets, hindering
articulated objects. Our GAPartManip inherits 918 object algorithm generalization to unknown objects. This work aims
instances across 19 categories from the previous GAPartNet to create a large-scale dataset with diverse photo-realistic and
dataset [9]. By leveraging these assets, we developed a actionable pose data covering all types of GAParts.
data generation pipeline for part manipulation, producing the
synthetic data needed to address the previously mentioned B. Articulated object manipulation
limitations. To improve generalizability and mitigate the sim- Due to unique kinematic structures and geometric shapes,
to-real gap, we incorporate domain randomization techniques articulated objects present significant challenges in manip-
[15] during data generation, ensuring a diverse range of ulation. Current methods can be broadly categorized into:
outputs. In total, our dataset contains approximately 14000 learning-based methods and prediction-planning methods.
scene-level samples with 8 billion part-oriented actionable Learning-based methods, such as reinforcement learning
pose annotations, encompassing a wide array of physical [10], [31] and imitation learning [32], [36], require a large
materials, object states, and camera perspectives. amount of high-quality robot demonstration. However, col-
Trained on the proposed dataset, we can obtain a depth lecting such data is both impractical and time-consuming,
reconstruction network and an actionable pose prediction and their sim-to-real performance heavily relies on simulator.
network separately to address the two limitations mentioned Current prediction-planning methods [2], [9], [11], [37]–[39]
earlier. Moreover, we compose these two neural networks focus on visual affordance but offer ambiguous interactive
modular to a novel articulated object manipulation frame- poses and struggle to generalize due to limited data. They
work. Through extensive experiments on both synthetic and rely on 3D point clouds, ignoring the impact of object materi-
real worlds, our method achieves state-of-the-art (SOTA) als. In the real world, depth cameras often miss critical points
performance in both individual module experiments and part like handles and lids, reducing sim-to-real performance.
manipulation experiments.
To summarize, our main contributions are as follows: III. GAPART M ANIP DATASET
• We introduce GAPartManip, a novel large-scale dataset
A. Overview
of various articulated objects featuring realistic, physics-
based rendering and diverse scene-level, part-oriented We construct a large-scale dataset, GAPartManip, to
actionable interaction pose annotations. address both depth estimation and actionable interaction
• We propose a novel framework for articulated ob- pose prediction challenges in articulated object manipulation
ject manipulation and evaluate each module separately, in real-world scenarios from a data-centric perspective. It
demonstrating superior effectiveness and robustness contains 19 common household articulated categories from
compared to baseline methods. GAPartNet, including Box, Bucket, CoffeeMachine, Dish-
• We conduct comprehensive experiments in the real washer, Door, KitchenPot, Laptop, Microwave, Oven, Printer,
world and achieve SOTA performance on articulated Refrigerator, Safe, StorageFurniture, Suitcase, Table, Toaster,
object manipulation tasks. Toilet, TrashCan, and WashingMachine, comprising a total
of 918 object instances after removing problematic assets.
II. R ELATED WORK
We build a photo-realistic rendering pipeline for each asset
A. Articulated object dataset in indoor scenes. We render RGB images, IR images, depth
Articulated object dataset and modeling is a crucial and maps, and part-level segmentations. Additionally, we create
longstanding research field in 3D vision and robotics, encom- high-quality and physics-plausible interaction pose annota-
passing a wide range of work in perception [9], [18]–[23], tions for each part of the articulated object. Then leverage
generation [24]–[28], and manipulation [9]–[11], [29]–[33]. our GPU-accelerated scene-level pose annotation pipeline
As to manipulation dataset, GAPartNet [9] annotates 6-DoF to generate dense, part-oriented actionable interaction pose
part pose to manipulate parts. Graspnet [34] and Contact- annotations for each data sample. Our dataset contains over
grasp [35] build several datasets, but these datasets all focus 8 billion actionable poses across 241680 data samples. Fig. 2
RGB Image
IR Rendering
Interaction Poses

Fig. 2. Data Examples in GAPartManip. GAPartManip is a novel large-scale synthetic dataset for articulated objects, featuring two important aspects:
(1) realistic, physics-based IR rendering for various object materials in diverse scenes, and (2) part-oriented actionable interaction pose annotations for a
wide range of articulated objects. Each column shows a data sample. From top to bottom, each column displays the RGB image, the IR image (only the
left IR image is shown here), and the scene-level actionable interaction pose annotations.

shows examples of data samples from our dataset. Our whole More importantly, we randomize the parameters of all dif-
data generation pipeline is illustrated in Fig. 3. fuse, transparent, specular, and metal materials of each
part corresponding to their semantics. Finally, we uniformly
randomize the joint poses of the object within its joint limits
Mesh Fusion FPS in each scene during the rendering process.
We render the objects and parts from different distances.
We render each scene with 5 object-centric camera per-
Photo-realistic Scene- Part-level Stable spectives for the whole object and 5 part-centric camera
level Rendering Pose Annotation
perspectives for each part. To place the object within the
camera view, i.e., the object-centric perspective, the camera
is positioned at a latitude of ranged in [10°,60°] and a
Scene-level Actionable
Pose Annotation longitude ranged in [-60°,60°] in the target object. To capture
the more fine-grained parts, i.e., the part-centric perspective,
we leverage part pose annotations from GAPartNet and the
Fig. 3. Dataset Generation Pipeline. For scene-level data sample
rendering, we input the object asset into our photo-realistic rendering current joint poses to determine the position and orientation
pipeline, generating one RGB image and two IR images (left and right) of each part in the scene. The camera is then randomly
for each camera perspective. For pose annotation, we begin by performing positioned around each part, aiming directly toward the part
mesh fusion on each GAPart of the object to establish a one-to-one
correspondence between GAParts and meshes. Then, we use FPS to obtain center. As a result, the target part occupies the primary area
the point cloud for each GAPart, enabling part-level stable interaction of the image. During this process, camera viewpoints are
pose annotation. These poses are further utilized for scene-level actionable randomly sampled within a latitude range of [0°,60°] and a
interaction pose annotation for each rendered data sample.
longitude range of [-75°,75°].
B. Photo-realistic Scene-level Rendering C. GPU-accelerated Scene-level Pose Annotation
Our photo-realistic rendering pipeline is built upon a) Part-level Stable Pose Annotation: We employ a
NVIDIA Isaac Sim [40]. Specifically, we simulate the RGB pose sampling strategy similar to GraspNet [34] to annotate
and IR imaging process of Intel RealSense D415, a widely- dense and diverse stable interaction poses for each GAPart,
used structured light camera for real-world depth estimation based on the original semantic annotations in GAPartNet [9].
in previous research works. We replicate the layout of the First, we perform mesh fusion for each part, merging the
D415 imaging system consisting of four hardware mod- meshes corresponding to the same part to establish a one-to-
ules, i.e., an IR projector, an RGB camera, and two infrared one correspondence between parts and meshes. Then, we
(IR) cameras. We also project a similar shadow pattern onto apply Farthest Point Sampling (FPS) to downsample the
the scenes with D415. mesh for each part, resulting in N candidate points for pose
Inspired by previous works [14], [41], we incorporate do- sampling. For each candidate point, we uniformly generate
main randomization techniques into our rendering pipeline to V × A × D candidate poses, where V is the number of
mimic the IR rendering under various lighting conditions and gripper views distributed uniformly over a spherical surface,
material properties in the real world. We render each object A represents the number of in-plane gripper rotations, and D
in 20 different scenes with various domain randomization refers to the number of gripper depths. In our case, N = 512,
settings. Concretely, we randomly vary ambient lighting, V = 64, A = 12, and D = 4. We follow GraspNet to
background, and object material properties in the scene, calculate the pose score based on antipodal analysis.
generating more diverse data that covers a wider range of b) Scene-level Actionable Pose Annotation: To obtain
real-world imaging conditions. we further randomize the part-centric grasping poses, We first project the part-level
ambient light positions and intensities within each scene. grasping poses into the scene using the part pose annotations,
and then filter out unreasonable and unreachable poses. More actionness score sP V
i and view-wise actionness score si are
concretely, We classify grasping poses that do not align with defined as:
single-view partial point clouds as unreasonable. Meanwhile, 1 X i,j
P i i,j
we consider poses that cause collisions between the gripper si = X ca 1 qk > T ck , (1)
i,j
Ak j,k
and other objects as unreachable. j,k

However, such a filtering process is computationally de- V 1 i

X i,j
i,j
si = X ca 1 qk > T ck , (2)
manding due to the large amounts of points in the scene. i,j
Ak k
To accelerate the pose annotation process, we implemented k

a CUDA-based optimization for the filtering process. Our where T is a predefined threshold to filter out inferior quality
optimization significantly reduces the processing time from poses. We then train Part-aware EcoGrasp [42] following
5 minutes to less than 2 seconds for each part which is nearly [42]. Additionally, we utilize the pre-trained GAPartNet [9]
a 150 times speed-up. As a result, the originally year-long to predict the motion direction which speficies the part
pose reduction process can now be completed within 3 days. movement direction after grasping the actionable part.

IV. F RAMEWORK C. Local planner module

We propose a novel framework to address cross-category We use CuRobo [43] as our motion planner. It optimizes
articulated object manipulation in real-world settings. As motion trajectories based on actionable poses given by the
illustrated in Fig. 4, the framework primarily consists of three pose prediction module, computes joint angles through in-
modules: a depth reconstruction module, a pose prediction verse kinematics, and drives the robot to execute trajectory
module, and a local planner module. actions through joint control modes. Subsequently, the robot
executes actions based on the motion direction r⃗p .
A. Depth reconstruction module V. E XPERIMENTS
The input to our system is a single view RGB-D obser- We conduct experiments for each module. The depth
l
vation including a raw depth Id , left IR image Iir , right IR estimation and actionable pose prediction experiments are
r
image Iir , and an RGB image Ic . However, the raw sensor conducted to illustrate the significance of our dataset in
depth are often imcomplete and even incorrect because trans- articulated object manipulation tasks. Meanwhile, real-world
parent and reflective surfaces are inherently ambiguous for experiments are carried out to compare the performance of
structured light and Time-of-Flight depth cameras. Therefore, our framework with existing methods. We also performed
we leverage diffusion model-based approaches to estimate ablation studies for each module.
and restore the incomplete depths of raw sensor outputs. We
use D3 RoMa [14] as our depth predictor and fine-tune it on A. Depth Estimation Experiments
our dataset. In this section, we evaluate different depth estimation
methods with our GAPartManip to demonstrate the effec-
B. Pose prediction module tiveness of our dataset for improving articulated object depth
Different from 6 DoF grapsing pose prediction for rigid estimation in both simulation and the real world.
object manipulation, we need to predict both the 6-DoF part Data Preparation. We split the dataset into training and test
grasping pose and the 2-DoF movement direction after grasp- sets using an approximate 8:2 ratio. To maintain comprehen-
ing. We adapt the SOTA method Economicgrasp [42] as our sive coverage, each object category is split carefully, ensuring
actionable pose estimator dubbed Part-aware EcoGrasp and that both the training and test sets include samples from all
use pretrained GAPartNet [9] to predict the part movement categories. Additionally, we make sure that samples rendered
direction. from the same object category are assigned exclusively to
To precisely annotate the part-centric interaction pose, either the training or test set. We compare our method with
we propose actionness instead of graspness in contrast to following baselines: leftmargin=10pt
Economicgrasp. To annotate actionness, we first denote the • SGM [44] is one of the most widely-used traditional
scene as a point cloud P = {pi |1, ..., N } with N points. algorithm for dense binocular stereo matching.
Then for each point pi , we uniformly discretize its sphere • RAFT-Stereo (RS) [45] is a learning-based binocular
space into V = {vj |j = 1, ..., V } approaching directions. stereo matching architecture built upon the dense optical
For each view vj of point pi , we generate L actionable flow estimation framework RAFT [46], using an itera-
pose candidates Ai,jk ∈ SE(3) indexed by k ∈ [1, L] by grid
tive update strategy to recursively refine the disparity
sampling along gripper depths and in-plane rotation angels map.
3
respectively. We employ antipodal analysis [34] to calculate • D RoMa (DR) [14] is a SOTA, learning-based stereo
the grasping quliaty score qki,j ∈ [0, 1.2]. Next, We define an depth estimation framework based on the diffusion
actionable label cia ∈ {0, 1} for each point indicating whether model. It excels at restoring noisy depth maps, espe-
this point is on a interacble part. We also define a scene- cially for transparent and specular surfaces.
level collision label ci,j
k ∈ {0, 1} for each pose indicating Evaluation Metrics. We evaluate the estimated disparity and
whether this pose will cause collision. Finally, the point-wise depth using the following metrics:
Actionable Pose
Estimator

Raw Depth
Depth 7-DoF Actionable Poses cuRobo
Estimator
Reconsturted Motion Direction
TODO
Depth Estimator

IR Images 6-DoF-based Motion

Depth Reconstruction Module Pose Prediction Module Local Planner Module

Fig. 4. Framework overview. Given IR images and raw depth, the depth reconstruction module first performs depth recovery. Subsequently, the pose
prediction module generates a 7-DOF actionable pose and a 3-DOF motion directive based on the reconstructed depth. Finally, the local planner module
carries out the action execution.

leftmargin=10pt strate reasonably good stereo depth estimation capabilities in

• EPE: Mean absolute difference between the ground the experiments. However, both RAFT-Stereo and D3 RoMa
truth and the estimated disparity map across all pixels. are significantly enhanced when fine-tuned on GAPartManip.
• RMSE: Root mean square of depth errors across all Specifically, RAFT-Stereo achieves a 150% improvement
pixels. in MAE compared to its pre-trained version, while our
• MAE: Mean absolute depth error across all pixels. model exhibits a 600% improvement, achieving the best
• REL: Mean relative depth error across all pixels.
performance in the simulation. As illustrated in Fig. 5, the
d dˆ fine-tuned models also demonstrate strong depth estimation
• δi : Percentage of pixels satisfying max ˆ, d < δi . d
d
performance in real-world scenarios. In particular, in real-
denotes the estimated depth. dˆ denotes the ground truth. world environments with challenging materials, as shown in
TABLE I
the first three rows of the figure, our model significantly
Q UANTITATIVE R ESULTS FOR D EPTH E STIMATION IN S IMULATION
outperforms the fine-tuned RAFT-Stereo and the raw depth,
exhibiting noticeably better robustness. Both simulation and
Methods EPE ↓ RMSE ↓ REL ↓ MAE ↓ δ1.05 ↑ δ1.10 ↑ δ1.25 ↑
real-world experiments demonstrate the effectiveness of our
SGM [44] 6.82 1.623 0.561 0.794 34.71 38.94 46.27
proposed GAPartManip in substantially improving depth
RS [45] 5.28 1.497 0.506 0.618 36.82 41.05 49.92 estimation for articulated objects with challenging materials.
DR [14] 2.82 0.732 0.268 0.317 46.22 67.62 83.09
RS* [45] 2.79 0.798 0.247 0.309 52.83 68.30 80.15 B. Actionable Pose prediction Experiments
Ours* 0.69 0.225 0.041 0.050 86.22 93.45 97.41
In this section, we evaluate the impact of our dataset on
* indicates that the method is trained on the GAPartManip dataset. improving the method for articulated object actionable pose
estimation.
Data Preparation. We split the dataset into training and
testing sets using an approximate 7:3 ratio. Specifically, we
further divide the test sets into 3 categories: seen instances,
unseen but similar instances, and novel instances. We com-
pare our methods with the following baselines:
leftmargin=10pt
• GSNet (GS) [47] is a grasping pose prediction model
trained on the GraspNet-1 billion [34] dataset for rigid
object. We evaluate both the pre-trained model and the
fine-tuned model separately.
• Where2Act (WA) [2] is an affordance-based method for
interacting with articulated objects. Unlike the original
Fig. 5. Qualitative Results for Depth Estimation in the Real World.
Our refined depth is more robust for transparent and translucent lids and
approach, we do not train a separate network for each
small handles compared to RAFT-Stereo. Zoom in to better observe small task. As Where2act cannot generate stable grasping
parts like handles and knobs. poses, we integrated GSNet, as referenced in [12], to
Results and Analysis. The quantitative results in simulation enhance where2act’s capabilities to align with experi-
are presented in Tab. I. The results indicate that the traditional mental setting.
stereo matching algorithm, SGM, struggles in scenes with • EconomicGrasp (EG) [42] is also a pose prediction
articulated objects with challenging material characteristics. method for rigid objects, which includes an interactive
The same observation applies to the pre-trained RAFT- grasp head and composite score estimation to enhance
Stereo. Meanwhile, the pre-trained D3 RoMa models demon- the precision of specific grasps.
RGB
EconomicGrasp
Pre-trained

Raw Depth
Ours

Ours
EconomicGrasp
Fig. 6. Qualitative comparison of actionable pose prediction on
synthetic data.

Evaluation Metrics. Following [34], we utilize precision to

evaluate the performance of actionable pose estimation.

Ours
P recisionµ = nsucµ /ngrasp (3)

P recisionµ represents the success rate (SR) of predicted

Fig. 7. Qualitative Results For Real-world Manipulation. The top-15
interaction poses at friction coefficient µ, where ngrasp scored actionable poses are displayed, with the red gripper representing the
denotes the number of predicted poses, and nsucµ denotes top-1 pose.
the number of successful grasps predicted under µ.
experiment consists of 7 distinct instances, including Stor-
Results and Analysis. Our quantitative results in simulation
ageFurniture, Box, and Microwave, evaluating the success
are presented in Tab. II. Even though both GSNet and
rate of the top-1 interactive pose for each method across open
our Part-aware EcoGrasp are trained on our data, they
(n=14) and close (n=17) tasks. As shown in Tab. V-C, the
outperform Where2Act, possibly because Where2Act strug-
overall success rate of GAPartManip is 61.29%, showcasing
gles with cross-category and cross-action reasoning. Part-
not only a successful transfer to the real world but also a
aware EcoGrasp and fine-tuned GSNet show a substantial
significant performance boost compared to other methods.
improvement in precision compared to pretrained models. Additionally, we perform ablation studies to assess how
It is evident that our data significantly enhances the ca- different modules affect the overall pipeline performance.
pability of existing methods in actionable pose estimation As shown in Fig. 7, depth cameras yield poor depth data
for articulated objects. Specifically, our dataset offers strong when faced with certain materials, significantly impacting
geometric priors for parts, enabling networks to focus more subsequent manipulations. Our depth reconstruction module
on actionable parts rather than non-actionable links. For effectively addresses this issue by repairing 2D depth map,
instance, although the pre-trained EconomicGrasp in Fig. 6 thereby enhancing the performance of subsequent modules.
generates a set of stable grasping poses, it cannot differentiate Similarly, as shown in Fig 7, GAPartManip tends to prior-
whether these poses act on actionable parts, meaning they itize interactable GAParts. This part-aware capability could
may fail in interacting with articulated objects. possibly explain why our method leads to such significant
TABLE II performance disparities as seen in Tab. V-C.
Q UANTITATIVE R ESULTS FOR ACTIONABLE POSE PREDICTION IN TABLE III
S IMULATION R EAL - WORLD ARTICULATED OBJECTS MANIPULATION RESULTS
Seen Unseen Novel Success Rate (%) ↑
Method Method
P P0.8 P0.4 P P0.8 P0.4 P P0.4 P0.8 Open Close Overall
GS [47] 13.28 11.55 6.70 17.36 15.57 9.19 9.76 8.43 5.25 AO-Grasp 28.57 29.41 29.03
EG [42] 24.72 19.65 9.97 23.91 20.29 9.90 14.56 12.02 9.23 Where2act 21.42 17.64 19.35
GS* [47] 25.70 20.26 9.00 25.45 20.28 9.67 23.99 20.55 11.20 GSNet 42.85 23.53 32.25
WA* [2] 14.43 12.44 6.53 11.04 7.41 2.52 4.17 1.85 0.47 Ours w/o Part-aware EcoGrasp 64.28 41.17 51.61
Ours* 55.33 51.19 30.25 56.26 53.02 32.91 41.65 39.06 23.25 Ours w/o Depth Reconstruction 50.00 29.41 38.70
Ours 64.28 58.82 61.29
* indicates that the method is trained on the GAPartManip dataset.

VI. C ONCLUSIONS
C. Real-World experiment In this paper, we build a large-scale synthetic dataset for
To validate the sim-to-real generalizability of GAPartMa- generalizable and actionable part manipulation with material-
nip, we conducte real-world experiments. We use a Franka agnostic articulated objects. Our dataset is the first large-
robot arm with an Intel RealSense camera to capture depth scale, diverse in instances, categories, scenes, and materi-
and IR images. We compare our method with three baselines: als articulated object dataset. Meanwhile, we propose an
Where2act, AO-Grasp, GSNet, and, like in V-B, We modified articulated object manipulation framework capable of zero-
the Where2act interaction pipeline to finish our tasks. The shot transfer to the real world. We conduct experiments on
individual modules and real-world overall experiments, with [18] L. Yi, H. Huang, D. Liu, E. Kalogerakis, H. Su, and L. Guibas,
results indicating the competitiveness of our approach. Our “Deep part induction from articulated object pairs,” arXiv preprint
arXiv:1809.07417, 2018. 2
dataset will be released. [19] C. Deng, J. Lei, W. B. Shen, K. Daniilidis, and L. J. Guibas, “Banana:
Banach fixed-point network for pointcloud segmentation with inter-
part equivariance,” in NeurIPS, 2024. 2
R EFERENCES
[20] X. Li, H. Wang, L. Yi, L. J. Guibas, A. L. Abbott, and S. Song,
“Category-level articulated object pose estimation,” in CVPR, 2020. 2
[1] F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, [21] G. Liu, Q. Sun, H. Huang, C. Ma, Y. Guo, L. Yi, H. Huang, and
Y. Yuan, H. Wang, L. Yi, A. X. Chang, L. J. Guibas, and H. Su, R. Hu, “Semi-weakly supervised object kinematic motion prediction,”
“SAPIEN: A simulated part-based interactive environment,” in The in CVPR, 2023. 2
IEEE Conference on Computer Vision and Pattern Recognition
[22] J. Lyu, Y. Chen, T. Du, F. Zhu, H. Liu, Y. Wang, and
(CVPR), June 2020. 1, 2
H. Wang, “Scissorbot: Learning generalizable scissor skill for
[2] K. Mo, L. J. Guibas, M. Mukadam, A. Gupta, and S. Tulsiani,
paper cutting via simulation, imitation, and sim2real,” in 8th
“Where2act: From pixels to actions for articulated 3d objects,” in
Annual Conference on Robot Learning, 2024. [Online]. Available:
Proceedings of the IEEE/CVF International Conference on Computer
https://openreview.net/forum?id=PAtsxVz0ND 2
Vision, 2021, pp. 6813–6823. 1, 2, 5, 6
[23] J. Zhang, N. Gireesh, J. Wang, X. Fang, C. Xu, W. Chen, L. Dai, and
[3] R. Wu, Y. Zhao, K. Mo, Z. Guo, Y. Wang, T. Wu, Q. Fan, X. Chen,
H. Wang, “Gamma: Graspability-aware mobile manipulation policy
L. Guibas, and H. Dong, “Vat-mart: Learning visual action trajectory
learning based on online grasping pose fusion,” in 2024 IEEE Inter-
proposals for manipulating 3d articulated objects,” arXiv preprint
national Conference on Robotics and Automation (ICRA). IEEE,
arXiv:2106.14440, 2021. 1
2024, pp. 1399–1405. 2
[4] Y. Wang, R. Wu, K. Mo, J. Ke, Q. Fan, L. J. Guibas, and H. Dong,
[24] Q. Chen, M. Memmel, A. Fang, A. Walsman, D. Fox, and A. Gupta,
“Adaafford: Learning to adapt manipulation affordance for 3d artic-
“Urdformer: Constructing interactive realistic scenes from real im-
ulated objects via few-shot interactions,” in European conference on
ages via simulation and generative modeling,” in Towards Generalist
computer vision. Springer, 2022, pp. 90–107. 1
Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL
[5] Y. Zhao, R. Wu, Z. Chen, Y. Zhang, Q. Fan, K. Mo, and H. Dong,
2023, 2023. 2
“Dualafford: Learning collaborative visual affordance for dual-gripper
[25] J. Mu, W. Qiu, A. Kortylewski, A. Yuille, N. Vasconcelos, and
manipulation,” arXiv preprint arXiv:2207.01971, 2022. 1
X. Wang, “A-sdf: Learning disentangled signed distance functions for
[6] B. Eisner, H. Zhang, and D. Held, “Flowbot3d: Learning 3d ar-
articulated shape representation,” in ICCV, 2021. 2
ticulation flow to manipulate articulated objects,” arXiv preprint
arXiv:2205.04382, 2022. 1 [26] Z. Jiang, C.-C. Hsu, and Y. Zhu, “Ditto: Building digital twins of
articulated objects from interaction,” in CVPR, 2022. 2
[7] H. Zhang, B. Eisner, and D. Held, “Flowbot++: Learning generalized
articulated objects manipulation via articulation projection,” arXiv [27] W.-C. Tseng, H.-J. Liao, L. Yen-Chen, and M. Sun, “Cla-nerf:
preprint arXiv:2306.12893, 2023. 1 Category-level articulated neural radiance field,” in ICRA, 2022. 2
[8] C. Zhong, Y. Zheng, Y. Zheng, H. Zhao, L. Yi, X. Mu, L. Wang, [28] R. Luo, H. Geng, C. Deng, P. Li, Z. Wang, B. Jia, L. Guibas,
P. Li, G. Zhou, C. Yang, et al., “3d implicit transporter for temporally and S. Huang, “Physpart: Physically plausible part completion for
consistent keypoint discovery,” in Proceedings of the IEEE/CVF interactable objects,” 2024. [Online]. Available: https://arxiv.org/abs/
International Conference on Computer Vision, 2023, pp. 3869–3880. 2408.13724 2
1 [29] J. Lei, C. Deng, B. Shen, L. Guibas, and K. Daniilidis, “Nap: Neural
[9] H. Geng, H. Xu, C. Zhao, C. Xu, L. Yi, S. Huang, and H. Wang, 3d articulation prior,” arXiv preprint arXiv:2305.16315, 2023. 2
“Gapartnet: Cross-category domain-generalizable object perception [30] J. Liu, H. I. I. Tam, A. Mahdavi-Amiri, and M. Savva, “Cage:
and manipulation via generalizable and actionable parts,” in Proceed- Controllable articulation generation,” in CVPR, 2024. 2
ings of the IEEE/CVF Conference on Computer Vision and Pattern [31] Y. Geng, B. An, H. Geng, Y. Chen, Y. Yang, and H. Dong, “End-
Recognition, 2023, pp. 7081–7091. 1, 2, 3, 4 to-end affordance learning for robotic manipulation,” in ICRA, 2023.
[10] H. Geng, Z. Li, Y. Geng, J. Chen, H. Dong, and H. Wang, “Partmanip: 2
Learning cross-category generalizable part manipulation policy from [32] R. Gong, J. Huang, Y. Zhao, H. Geng, X. Gao, Q. Wu, W. Ai, Z. Zhou,
point cloud observations,” in Proceedings of the IEEE/CVF Conference D. Terzopoulos, S.-C. Zhu, et al., “Arnold: A benchmark for language-
on Computer Vision and Pattern Recognition, 2023, pp. 2978–2988. grounded task learning with continuous states in realistic 3d scenes,”
1, 2 in ICCV, 2023. 2
[11] H. Geng, S. Wei, C. Deng, B. Shen, H. Wang, and L. Guibas, “Sage: [33] Y. Kuang, J. Ye, H. Geng, J. Mao, C. Deng, L. Guibas,
Bridging semantic and actionable parts for generalizable manipulation H. Wang, and Y. Wang, “Ram: Retrieval-based affordance transfer
of articulated objects,” 2024. 1, 2 for generalizable zero-shot robotic manipulation,” 2024. [Online].
[12] J. Wang, W. Liu, Q. Yu, Y. You, L. Liu, W. Wang, and C. Lu, “Rpmart: Available: https://arxiv.org/abs/2407.04689 2
Towards robust perception and manipulation for articulated objects,” [34] H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-
arXiv preprint arXiv:2403.16023, 2024. 1, 2, 5 scale benchmark for general object grasping,” in Proceedings of the
[13] Y. Geng, B. An, H. Geng, Y. Chen, Y. Yang, and H. Dong, “Rlafford: IEEE/CVF Conference on Computer Vision and Pattern Recognition,
End-to-end affordance learning for robotic manipulation,” in 2023 2020, pp. 11 444–11 453. 2, 3, 4, 5, 6
IEEE International Conference on Robotics and Automation (ICRA), [35] M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-
2023, pp. 5880–5886. 1, 2 graspnet: Efficient 6-dof grasp generation in cluttered scenes,” 2021.
[14] S. Wei, H. Geng, J. Chen, C. Deng, W. Cui, C. Zhao, X. Fang, [Online]. Available: https://arxiv.org/abs/2103.14127 2
L. Guibas, and H. Wang, “D3roma: Disparity diffusion-based depth [36] P.-L. Guhur, S. Chen, R. G. Pinel, M. Tapaswi, I. Laptev, and
sensing for material-agnostic robotic manipulation,” in 8th Annual C. Schmid, “Instruction-driven history-aware policies for robotic ma-
Conference on Robot Learning (CoRL), 2024. 1, 3, 4, 5 nipulations,” in Conference on Robot Learning. PMLR, 2023, pp.
[15] J. Shi, A. Yong, Y. Jin, D. Li, H. Niu, Z. Jin, and H. Wang, 175–187. 2
“Asgrasp: Generalizable transparent object reconstruction and 6-dof [37] W. Liu, J. Mao, J. Hsu, T. Hermans, A. Garg, and J. Wu,
grasp detection from rgb-d active stereo camera,” in 2024 IEEE “Composable part-based manipulation,” 2024. [Online]. Available:
International Conference on Robotics and Automation (ICRA). IEEE, https://arxiv.org/abs/2405.05876 2
2024, pp. 5441–5447. 1, 2 [38] S. Ling, Y. Wang, S. Wu, Y. Zhuang, T. Xu, Y. Li, C. Liu,
[16] C. P. Morlans, C. Chen, Y. Weng, M. Yi, Y. Huang, N. Heppert, and H. Dong, “Articulated object manipulation with coarse-to-fine
L. Zhou, L. Guibas, and J. Bohg, “Ao-grasp: Articulated object grasp affordance for mitigating the effect of point cloud noise,” 2024.
generation,” arXiv preprint arXiv:2310.15928, 2023. 2 [Online]. Available: https://arxiv.org/abs/2402.18699 2
[17] B. An, Y. Geng, K. Chen, X. Li, Q. Dou, and H. Dong, “Rgbmanip: [39] Y. Kuang, J. Ye, H. Geng, J. Mao, C. Deng, L. Guibas, H. Wang, and
Monocular image-based robotic manipulation through active object Y. Wang, “Ram: Retrieval-based affordance transfer for generalizable
pose estimation,” in 2024 IEEE International Conference on Robotics zero-shot robotic manipulation,” in 8th Annual Conference on Robot
and Automation (ICRA). IEEE, 2024, pp. 7748–7755. 2 Learning. 2
[40] J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and
D. Fox, “Gpu-accelerated robotic simulation for distributed reinforce-
ment learning,” 2018. 3
[41] Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, and H. Wang,
“Domain randomization-enhanced depth simulation and restoration for
perceiving and grasping specular and transparent objects,” in European
Conference on Computer Vision (ECCV), 2022. 3
[42] X.-M. Wu, J.-F. Cai, J.-J. Jiang, D. Zheng, Y.-L. Wei, and W.-S.
Zheng, “An economic framework for 6-dof grasp detection,” 2024.
[Online]. Available: https://arxiv.org/abs/2407.08366 4, 5, 6
[43] B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk,
V. Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos, N. Ratliff,
and D. Fox, “Curobo: Parallelized collision-free robot motion gen-
eration,” in 2023 IEEE International Conference on Robotics and
Automation (ICRA), 2023, pp. 8112–8119. 4
[44] H. Hirschmuller, “Stereo processing by semiglobal matching and
mutual information,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. 4, 5
[45] L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field
transforms for stereo matching,” in 2021 International Conference on
3D Vision (3DV). IEEE, 2021, pp. 218–227. 4, 5
[46] Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms
for optical flow,” in Computer Vision–ECCV 2020: 16th European
Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II
16. Springer, 2020, pp. 402–419. 4
[47] C. Wang, H.-S. Fang, M. Gou, H. Fang, J. Gao, and C. Lu, “Grasp-
ness discovery in clutters for fast and accurate grasp detection,” in
Proceedings of the IEEE/CVF International Conference on Computer
Vision, 2021, pp. 15 964–15 973. 5, 6

GAPart Net
No ratings yet
GAPart Net
21 pages
Nasa Neural Articulated Shape Approximation
No ratings yet
Nasa Neural Articulated Shape Approximation
20 pages
Learning 3D Part Assembly From A Single Image
No ratings yet
Learning 3D Part Assembly From A Single Image
25 pages
PERCEIVER-ACTOR - A Multi-Task Transformer For Robotic Manipulation
No ratings yet
PERCEIVER-ACTOR - A Multi-Task Transformer For Robotic Manipulation
28 pages
Geometry-Based Grasping Pipeline For Bi-Modal Pick and Place
No ratings yet
Geometry-Based Grasping Pipeline For Bi-Modal Pick and Place
7 pages
Understanding Everyday Hands in Action From RGB-D Images
No ratings yet
Understanding Everyday Hands in Action From RGB-D Images
9 pages
NARF22: Neural Articulated Radiance Fields For Configuration-Aware Rendering
No ratings yet
NARF22: Neural Articulated Radiance Fields For Configuration-Aware Rendering
8 pages
Gesture Recognition
No ratings yet
Gesture Recognition
24 pages
6-DoF Grasp Pose Evaluation and Optimization Via Transfer Learning From NeRF
No ratings yet
6-DoF Grasp Pose Evaluation and Optimization Via Transfer Learning From NeRF
7 pages
Splattingavatar: Realistic Real-Time Human Avatars With Mesh-Embedded Gaussian Splatting
No ratings yet
Splattingavatar: Realistic Real-Time Human Avatars With Mesh-Embedded Gaussian Splatting
15 pages
Mvgrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments
No ratings yet
Mvgrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments
11 pages
Robotic Pick-and-Place of Novel Objects in Clutter With Multi-Affordance Grasping and Cross-Domain Image Matching
No ratings yet
Robotic Pick-and-Place of Novel Objects in Clutter With Multi-Affordance Grasping and Cross-Domain Image Matching
8 pages
Neural Parts: Learning Expressive 3D Shape Abstractions With Invertible Neural Networks
No ratings yet
Neural Parts: Learning Expressive 3D Shape Abstractions With Invertible Neural Networks
33 pages
Generative 3D Semantic Flow For Pose-Aware and Generalizable Object Manipulation.18369v1
No ratings yet
Generative 3D Semantic Flow For Pose-Aware and Generalizable Object Manipulation.18369v1
13 pages
Paper 5
No ratings yet
Paper 5
11 pages
2020vision-Based Robotic Grasping From Object Localization, Object Pose Estimation To Grasp Estimation For Parallel Grippers A Review
No ratings yet
2020vision-Based Robotic Grasping From Object Localization, Object Pose Estimation To Grasp Estimation For Parallel Grippers A Review
58 pages
Fdsafdasfgvbweffv
No ratings yet
Fdsafdasfgvbweffv
34 pages
Bipartite Graph Reasoning Gans For Person Pose and Facial Image Synthesis
No ratings yet
Bipartite Graph Reasoning Gans For Person Pose and Facial Image Synthesis
15 pages
A Lightweight Object Grasping Network Using GhostNet
No ratings yet
A Lightweight Object Grasping Network Using GhostNet
10 pages
8 Vol 103 No 6
No ratings yet
8 Vol 103 No 6
15 pages
Look Ma, No Hands!
No ratings yet
Look Ma, No Hands!
19 pages
2023 - Poland - PocketFinderGNN - A Manufacturing Feature Recognition Software
No ratings yet
2023 - Poland - PocketFinderGNN - A Manufacturing Feature Recognition Software
10 pages
Articulated Pose Estimation With Flexible Mixtures-Of-Parts
No ratings yet
Articulated Pose Estimation With Flexible Mixtures-Of-Parts
8 pages
Paper 4
No ratings yet
Paper 4
22 pages
Utama IEEE
No ratings yet
Utama IEEE
6 pages
Planning With Spatial-Temporal Abstraction From Point Clouds For Deformable Object Manipulation
No ratings yet
Planning With Spatial-Temporal Abstraction From Point Clouds For Deformable Object Manipulation
25 pages
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
No ratings yet
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
16 pages
Fvit-Grasp: Grasping Objects With Using Fast Vision Transformers
No ratings yet
Fvit-Grasp: Grasping Objects With Using Fast Vision Transformers
7 pages
When Transformer Meets Robotic Grasping Exploits Context For Efficient Grasp Detection
No ratings yet
When Transformer Meets Robotic Grasping Exploits Context For Efficient Grasp Detection
8 pages
Object Detection and Localization Using Stereo Cameras
No ratings yet
Object Detection and Localization Using Stereo Cameras
6 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Point Net GPD
No ratings yet
Point Net GPD
7 pages
GraspNet 1B
No ratings yet
GraspNet 1B
10 pages
Robot Learning With Implicit Representations
No ratings yet
Robot Learning With Implicit Representations
83 pages
NeurIPS 2022 Neural Shape Deformation Priors Paper Conference
No ratings yet
NeurIPS 2022 Neural Shape Deformation Priors Paper Conference
16 pages
Mdpi
No ratings yet
Mdpi
21 pages
BADGr-A Toolbox For Box-Based Approximation, Decomposition and GRasping
No ratings yet
BADGr-A Toolbox For Box-Based Approximation, Decomposition and GRasping
10 pages
Schmidt Ryan M 201106 PHD Thesis
No ratings yet
Schmidt Ryan M 201106 PHD Thesis
271 pages
Progress and Limitations of Deep Networks To Recog
No ratings yet
Progress and Limitations of Deep Networks To Recog
35 pages
Hong Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction CVPR 2022 Paper
No ratings yet
Hong Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction CVPR 2022 Paper
11 pages
Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation
No ratings yet
Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation
17 pages
I2c-Net Using Instance-Level Neural Networks For M
No ratings yet
I2c-Net Using Instance-Level Neural Networks For M
8 pages
A Machine Learning Approach For Visual Recognition of Com - 2017 - Procedia Manu
No ratings yet
A Machine Learning Approach For Visual Recognition of Com - 2017 - Procedia Manu
8 pages
Do Egocentric Scene Understanding Via Multimodal Spatial Rectifier CVPR 2022 Paper
No ratings yet
Do Egocentric Scene Understanding Via Multimodal Spatial Rectifier CVPR 2022 Paper
10 pages
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
No ratings yet
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
10 pages
3 D Models and Match
No ratings yet
3 D Models and Match
35 pages
Qort Former
No ratings yet
Qort Former
9 pages
Chen 等 - MeshAnything Artist-Created Mesh Generation with Autoregressive Transformers
No ratings yet
Chen 等 - MeshAnything Artist-Created Mesh Generation with Autoregressive Transformers
16 pages
Research Collabs TU Berlin
No ratings yet
Research Collabs TU Berlin
35 pages
Robot Grasping Image Based
No ratings yet
Robot Grasping Image Based
7 pages
Development of Robotic End-Effector Using Sensors For Part Recognition and Grasping
No ratings yet
Development of Robotic End-Effector Using Sensors For Part Recognition and Grasping
6 pages
Research Paper
No ratings yet
Research Paper
15 pages
PMLR (2018) - Sim-to-Real Transfer in Reinforcement Learning For Deformable Object Manipulation
No ratings yet
PMLR (2018) - Sim-to-Real Transfer in Reinforcement Learning For Deformable Object Manipulation
10 pages
GAvatar - Animatable 3D Gaussian Avatars With Implicit Mesh Learning
No ratings yet
GAvatar - Animatable 3D Gaussian Avatars With Implicit Mesh Learning
21 pages
Hand Gesture Recognition
No ratings yet
Hand Gesture Recognition
5 pages
splat mover 斯坦福数字孪生解决方案
No ratings yet
splat mover 斯坦福数字孪生解决方案
23 pages
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
No ratings yet
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
11 pages
Graphics Capsule Learning Hierarchical 3D Face Representations
No ratings yet
Graphics Capsule Learning Hierarchical 3D Face Representations
10 pages
An Efficient Attribute-Preserving Framework For Face Swapping
No ratings yet
An Efficient Attribute-Preserving Framework For Face Swapping
12 pages
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
No ratings yet
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
3 pages
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
No ratings yet
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
27 pages
What Neural Networks Learn Is What Network Designers Say.18343v1
No ratings yet
What Neural Networks Learn Is What Network Designers Say.18343v1
16 pages
Certified Training With Branch-and-Bound.18235v1
No ratings yet
Certified Training With Branch-and-Bound.18235v1
16 pages
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
No ratings yet
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
11 pages
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
No ratings yet
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
33 pages
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1
No ratings yet
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1
6 pages
Can LLMs Plan Paths in The Real World?
No ratings yet
Can LLMs Plan Paths in The Real World?
17 pages
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
No ratings yet
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
23 pages
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
No ratings yet
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
52 pages
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
No ratings yet
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
22 pages
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
No ratings yet
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
17 pages
Isometry Pursuit
No ratings yet
Isometry Pursuit
18 pages
Functional Relevance Based On The Continuous Shapley Value
No ratings yet
Functional Relevance Based On The Continuous Shapley Value
36 pages
Large Language Model-Brained GUI Agents: A Survey
No ratings yet
Large Language Model-Brained GUI Agents: A Survey
78 pages
LLM-ABBA: Understand Time Series Via Symbolic Approximation
No ratings yet
LLM-ABBA: Understand Time Series Via Symbolic Approximation
13 pages
SoK: Watermarking For AI-Generated Content
No ratings yet
SoK: Watermarking For AI-Generated Content
28 pages
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
No ratings yet
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
9 pages
Test Bank For Personality, 10th Edition Jerry M. Burger
100% (1)
Test Bank For Personality, 10th Edition Jerry M. Burger
36 pages
Evaluation of Vertical Shaft Stability in UG Mines
No ratings yet
Evaluation of Vertical Shaft Stability in UG Mines
23 pages
See Back of For: For Instructions, Form DR-1
No ratings yet
See Back of For: For Instructions, Form DR-1
1 page
3R Article Assignment Q1!24!25
No ratings yet
3R Article Assignment Q1!24!25
4 pages
Xi Pa1 Syllabus
No ratings yet
Xi Pa1 Syllabus
3 pages
Multimedia Cheatsheet
No ratings yet
Multimedia Cheatsheet
2 pages
Surface Modification of Biomedical Titanium Alloy
No ratings yet
Surface Modification of Biomedical Titanium Alloy
23 pages
Thermocouples
No ratings yet
Thermocouples
21 pages
2024 2025 ASME Cold Plate Design Competition Guidance
No ratings yet
2024 2025 ASME Cold Plate Design Competition Guidance
6 pages
Examining The State of The Art of Audience Development in Museums and Heritage Organisations: A Systematic Literature Review
No ratings yet
Examining The State of The Art of Audience Development in Museums and Heritage Organisations: A Systematic Literature Review
24 pages
Guide List
No ratings yet
Guide List
414 pages
MYP - Doron, Redox Reactions
No ratings yet
MYP - Doron, Redox Reactions
16 pages
Promat Promatect H Fire Resistant Cladding To Steel Ducts en My 2023 03
No ratings yet
Promat Promatect H Fire Resistant Cladding To Steel Ducts en My 2023 03
20 pages
LDM COURSE 2 For Teachers - Overview and Module 1
No ratings yet
LDM COURSE 2 For Teachers - Overview and Module 1
38 pages
(23-24) K11-ĐỀ VÀ ĐÁP ÁN THI HSG TRƯỜNG-ĐỀ CHÍNH THỨC
No ratings yet
(23-24) K11-ĐỀ VÀ ĐÁP ÁN THI HSG TRƯỜNG-ĐỀ CHÍNH THỨC
14 pages
HCZ H82
No ratings yet
HCZ H82
7 pages
Performance Evaluation of Machine Learning
No ratings yet
Performance Evaluation of Machine Learning
5 pages
01 Theory
No ratings yet
01 Theory
28 pages
Emergency Nursing the Profession the Pathway the Practice by Jeff Solheim eBook and TestBank Bundle Fast Access
No ratings yet
Emergency Nursing the Profession the Pathway the Practice by Jeff Solheim eBook and TestBank Bundle Fast Access
334 pages
Key Sentence Patterns - Ielts Writing
No ratings yet
Key Sentence Patterns - Ielts Writing
5 pages
Article Writing Sec1 Orientation (Instructions and Task)
No ratings yet
Article Writing Sec1 Orientation (Instructions and Task)
3 pages
Statistics Chapter3 BSC211
No ratings yet
Statistics Chapter3 BSC211
20 pages
David Wolfe Essential Sugars FINAL PDF
100% (1)
David Wolfe Essential Sugars FINAL PDF
23 pages
Correlation and Regression Analysis
0% (1)
Correlation and Regression Analysis
17 pages
Poster TPT Perhapi (Adrian Hartanto)
No ratings yet
Poster TPT Perhapi (Adrian Hartanto)
1 page
Fatty Acids Chemistry, Synthesis, and Applications Edited by Moghis U. Ahmad
No ratings yet
Fatty Acids Chemistry, Synthesis, and Applications Edited by Moghis U. Ahmad
28 pages
Amspec Techtalk - Pona Piano
No ratings yet
Amspec Techtalk - Pona Piano
4 pages
Levels of Organizational Behavior
100% (1)
Levels of Organizational Behavior
3 pages
English Quotes 1
No ratings yet
English Quotes 1
2 pages
Tet 2024 2 Paper II Syllabus Telugu
No ratings yet
Tet 2024 2 Paper II Syllabus Telugu
8 pages

A Large-Scale Part-Centric Dataset For Material-Agnostic Articulated Object Manipulation.18276v1

Uploaded by

A Large-Scale Part-Centric Dataset For Material-Agnostic Articulated Object Manipulation.18276v1

Uploaded by

GAPartManip: A Large-scale Part-centric Dataset for

Material-Agnostic Articulated Object Manipulation

Abstract— Effectively manipulating articulated objects in

household scenarios is a crucial step toward achieving general

I. I NTRODUCTION Sim-to-Real Transfer

However, such a filtering process is computationally de- V 1 i

IV. F RAMEWORK C. Local planner module

IR Images 6-DoF-based Motion

Depth Reconstruction Module Pose Prediction Module Local Planner Module

leftmargin=10pt strate reasonably good stereo depth estimation capabilities in

Evaluation Metrics. Following [34], we utilize precision to

P recisionµ represents the success rate (SR) of predicted

You might also like