Abstract:
Computer graphics, 3D computer vision and robotics communities have pro duced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. These use cases include, but are not limited to, data encoding and compression, shape completion and reconstruction from partial 3D views. How ever, controllable 3D shape generation and single-view reconstruction remain relatively unexplored topics that are tightly intertwined and can unlock new design approaches. In this work, we propose a unified 3D shape manipulation and single-view reconstruc tion framework that builds upon Deep Implicit Templates [1], a 3D generative model that can also generate correspondence heat maps for a set of 3D shapes belonging to the same category. For this purpose, we start by providing a comprehensive overview of 3D shape representations and related work, and then describe our framework and pro posed methods. Our framework uses ShapeNetV2 [2] as the core dataset and enables finding both unsupervised and supervised directions within Deep Implicit Templates. More specifically, we use PCA to find unsupervised directions within Deep Implicit Templates, which are shown to encode a variety of local and global changes across each shape category. In addition, we use the latent codes of encoded shapes and metadata of the ShapeNet dataset to train linear SVMs and perform supervised manipulation of 3D shapes. Finally, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) [3] and a joint image-text representational model, CLIP [4], for fast and efficient Single View Reconstruction (SVR). More specifi cally, we propose a novel mapping network architecture that learns a mapping between the latent spaces ViT and CLIP, and DIT. Our results show that our method is both view-agnostic and enables high-quality and real-time SVR.