Structure From Action

Learning Interactions for Articulated Object 3D Structure Discovery

Articulated objects make up a significant portion of our environment. Discovering their parts, joints, and kinematics is crucial for robots to interact with these objects. We introduce Structure from Action (SfA), a framework that discovers the 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions. Our key insight is that 3D interaction and perception should be considered in conjunction to construct 3D articulated CAD models, especially in the case of categories not seen during training. By selecting informative interactions, SfA discovers parts and reveals initially occluded surfaces, like the inside of a closed drawer. By aggregating visual observations in 3D, SfA accurately segments multiple parts, reconstructs part geometry, and infers all joint parameters in a canonical coordinate frame. Our experiments demonstrate that a single SfA model trained in simulation can generalize to many unseen object categories with unknown kinematic structures and to real-world objects. Code and data will be publicly available.


Latest version: arXiv: [cs.CV] or here

Code and instructions will be avaliable.


1 Columbia University            2 Allen Institute for AI


title={Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery}, 
author={Nie, Neil and Gadre, Samir Yitzhak and Ehsani, Kiana and Song, Shuran},
year={2022} }

Method Summary

Given a raw RGB point cloud, SfA infers and executes informative actions to construct an articulated CAD model, which consists of multiple 3D part meshes and the revolute, and prismatic joints connecting them. The SfA framework consists of four components: an interaction policy, which chooses informative actions that move parts, a part aggregation module, which tracks part discoveries over a sequence of interactions, a joint estimation module, which predicts joint parameters and kinematic constraints of the articulation, and finally, the pipeline for the construction of the articulated CAD model.

SfA Interaction and Perception Pipeline

The video below demonstrates SfA's ability to infer informative multi-step interactions given an articulated object, and generate the articulated 3D CAD model of the object overtime.

3D Articulated CAD Models Results

We compare our method (SfA) with Ditto, a perception network that infers object's part segmentation and joint parameters from a single-step interaction. We combine Ditto with the other interaction policy to form a full pipeline. Our method outperforms Ditto baseline on both parts reconstruction and joints estimation (revolute: red, prismatic: blue).

Real World Experiments

To validate the generalization of our approach to real-world data. The model performs well on previously unseen instances in the real world despite challenging noise artifacts from the real RGBD camera.

We implement a capture pipeline that uses a 6DoF robot arm with a wrist-mounted camera to capture registered RGBD images of real-world articulated objects. For interactions, we allow a human to move parts. The results below demonstrates impressive part and joint discovery and part tracking, thereby validating the sim2real adaption of the perception module.


This work was supported in part by National Science Foundation under 2143601, 2037101, and 2132519. Thank you Cheng Chi, Huy Ha, Zhenjia Xu, Zeyi Liu, and other colleagues of the CAIR lab for your valuable feedback and support. Thanks to Cheng Chi and Zhenjia Xu for your help with the UR5 robot experiments. We would like to thank Google for the UR5 robot hardware.


If you have any questions, please feel free to contact Neil