学习邻域结构的三维mesh卷积

播放视频

视频文档

学习邻域结构的三维mesh卷积

下载 3

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
视频嵌入链接文档嵌入链接
<iframe src="https://www.slidestalk.com/Baiyulan/AAAI21MeshPoint90360?embed&video" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

白玉兰开源

发布于

4年前

603

人观看

#信息技术

mesh是一种常见的三维形状格式，mesh的表征学习在计算机三维视觉和图形学有着广泛的应用。众所周知，CNN已经在结构化数据（比如图像，语音等）取得了巨大的成功。然而，不同于结构化数据，mesh是一种图结构，每个顶点的邻域点个数是变化的，CNN无法直接应用在mesh这种数据上。
我们提出了一种自适应邻域结构的卷积，根据每个顶点的邻域结构自适应地学习一个权值矩阵对邻域点进行重采样，使得每个顶点的临域点遵从一个统一的隐式顺序，从而可以利用CNN卷积操作实现mesh的表征学习。
实验表明，我们的方法不管是在时间效率上还是重建精度上极大地超越了之前的方法。

高忠派，现为上海交通大学人工智能研究院博士后，2018年博士毕业于上海交通大学电子工程系。研究方向包含计算机三维视觉、三维显示等。博士期间访问哈佛医学院，从事基于三维显示晕动症的问题。在ACM MM, AAAI, TMM, TCyb, Display, DSP等会议和期刊上发表论文数十篇。获得DynaVis@CVPR 2020动态场景重建研讨会的最佳论文奖。博士后期间入选国家博士后创新人才支持计划，上海市超级博士后激励计划，获得国家自然科学青年基金项目资助。

展开查看详情

1 .1

2 .Learning Local Neighboring Structure for Robust 3D Shape Representation Zhongpai Gao1, Junchi Yan1, Guangtao Zhai1, Juyong Zhang2, Yiyan Yang1, and Xiaokang Yang1 1Artificial Intelligence Institute, Shanghai Jiao Tong University 2University of Science and Technology of China April 8th, 2021

3 . 3 The success story of deep learning Medical image processing The data (e.g., audio and texts, images, CT scans) are regular and grid-structured Convolutional neural networks (CNN) - anisotropic filters - hierarchical sampling

4 .3D geometric data — meshes A lot of real-world data do not ‘live’ on grids and are graph-structured • social networks, knowledge graphs, molecules, etc. This paper focuses on 3D meshes that share a fixed-topology template 4

5 . 5 Enabling technology for many applications 3D shape correspondence [Donati 2020] 3D body reconstruction [Pavlakos 2019] Animated Emojis Filters [Curious Case of Benjamin Button] [Beowulf] [Samsung] [Instagram] [Siren GDC 2018] CG Actors in VFX, VR, Games Consumer Applications

6 .Recap: CNN on Euclidean data Kipf & Welling (ICLR6 2017)

7 .Recap: CNN on Euclidean data W is an anisotropic filter Kipf & Welling (ICLR7 2017)

8 .Definition of graph G = (𝒱𝒱, ℰ) xi ,3 𝒱𝒱 : Set of nodes {xi }, |𝒱𝒱|=𝑁𝑁 xi ,2 ℰ : Set of edges {(xi , xj )} xi ,0 𝒩𝒩𝒾𝒾 : Set of one-ring neighbors of xi xi ,1 {xi,0 , xi,1 , …, xi, 𝒩𝒩𝒾𝒾 −1 } xi ,4 xi ,5 8

9 . Previous methods GCN using isotropic filters xi ,3 � XW , Y=f A xi ,2 � is normalized adjacency matrix and where A xi ,0 W is an isotropic filter. xi ,1 xi ,4 xi ,5 9 T. N. Kipf, M. Welling, Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)

10 . Previous methods SpiralNet Bouritsas, Giorgos, et al. "Neural 3d morphable models: Spiral convolutional networks for 3d shape representation 10 learning and generation." Proceedings of the IEEE International Conference on Computer Vision. 2019.

11 .Permutation matrix Inspired by permutation matrix, we learn a weighting matrix, i.e., a soft- permutation matrix, to adapt the local structure of a vertex. 11

12 .LSA-Conv for 3D meshes Local structure-aware anisotropic convolution (LSA-Conv) xi ,0 xi ,1 xi ,2 xi ,3 xi ,4 xi ,5 x0 x1 x2 x3 x4 x5 x6 x7 x8 yi xi ,3 x0 x1 x2 xi ,2 X i Pi → X i vec( X i )Τ W + b → yi xi ,0 x3 x4 x5 yi xi ,1 xi ,4 x6 x7 x8 xi ,5 Conventional convolution operation 12

13 .Architecture of autoencoder as the testbed 13

14 .Experiments Datasets DFAUST: A human body dataset that collects over 40,000 real meshes, capturing 129 dynamic performances from 10 subjects COMA: A human facial dataset that consists of 12 classes of extreme expressions from 12 different subjects Training Adam optimizer, learning rate 0.001 with decay rate 0.99 in every epoch Batch size is 32 total epoch number is 300. 14

15 .Results 15

16 .Results 16

17 .Results 17

18 .Results 18

19 .Results 19

20 .Parameter reduction for LSA-Conv (LSA-small) xi ,0 xi ,1 xi ,2 xi ,3 xi ,4 xi ,5 Pi x0 x1 x2 x3 x4 x5 x6 x7 x8 xi ,3 x0 x1 x2 xi ,2 X i Pi → X i xi ,0 x3 x4 x5 xi ,1 xi ,4 x6 x7 x8 xi ,5 vi PB → Pi vec( X i )Τ W + b → yi  yi vi  PB 20

21 .Results 21

22 .Ablation study 22

23 . 23 Applications

24 . 24 Applications: 3D shape correspondences For the FAUST-inter dataset, we achieve 2.501 cm: an improvement of 15% over Groueix et al. (2018): 2.878cm. Groueix et al. (2018) simply uses PointNet as the decoder.

25 . 25 Applications: monocular 3D face reconstruction

26 . 26 Learning neural dictionary

27 . 27 Results

28 . 28 Results

29 .3D geometric data — point clouds LiDAR point clouds 29

0点赞

0收藏

3下载