申请试用
HOT
登录
注册
 
探索新的神经网络算子involution
0 点赞
0 收藏
0下载
白玉兰开源
/
发布于
/
19
人观看

卷积作为现代神经网络中的核心构件,引发了深度学习技术在视觉领域的发展浪潮。在这篇工作中,我们重新思考了标准卷积核在空间维度和通道维度的固有特性,即空间不变性和通道特异性。

与其相反地,我们通过反转以上的两个设计准则,提出了一种新颖的神经网络算子,称为“involution”。另外我们解释了最近应用广泛的自注意力操作,并将其作为一种复杂的特例归入了“involution”的范畴。
我们提出的“反转卷积”算子可以作为基础模块替代普通卷积来搭建新一代的视觉神经网络,在不同的视觉任务中支持多种多样的深度学习模型,包括ImageNet图像分类,COCO目标检测和实例分割,Cityscapes语义分割。基于“反转卷积”的深度神经网络相较于ResNet-50对应的卷积神经网络模型,在上述任务中分别将识别准确率提升1.6%,边界框AP提升2.5%和2.4%,类别平均IoU提升4.7%,而将计算代价压缩到66%,65%,72%和57%。

相关视觉任务的源代码和预训练模型开放在https://github.com/d-li14/involution

李铎,香港科技大学计算机系二年级研究生,本科毕业于清华大学自动化系,于 ICCV, CVPR, ECCV 发表顶会论文 10 篇,曾在 Intel,NVIDIA,SenseTime,ByteDance等公司实习,曾获 2020 年度 CCF-CV 学术新锐奖,其余信息详见个人主页https://duoli.org

展开查看详情

1.

2. involution : Inverting the Inherence of Convolution for Visual Recognition Duo Li Jie Hu Changhu Wang Tong Zhang Qifeng Chen et al.

3.Convolution for Visual Recognition • Convolution operation: • input: • output: • convolution kernel: • receptive Field: • Two inherent properties: • spatial-agnostic: same kernel for different positions • channel-specific: different kernels for different channels Conv arithmetic, Dumoulin et al. arXiv’16

4.Principles of convolution • spatial-agnostic: • pros: parameter efficiency, translation equivalence • cons: inflexible kernel weight, limited spatial span • channel-specific: • pros: information encoding • cons: inter-channel redundancy Long-range and self-adaptive relationship modeling is desired, provided that computational efficiency is prioritized.

5.Neural Primitives Are there any alternatives for visual recognition? Even better than conventional convolution? Yes! involution is all you need!

6.Involution for Visual Recognition • Two inverted properties from convolution: • spatial-specific: kernel privatized for different positions • channel-agnostic: kernel shared across different channels • involution operation: • involution kernel: • #groups: G • kernel generated based on input feature map -> kernel size aligned with the input tensor size

7.A Simple Yet Effective Formulation • kernel generated from a single pixel with an MLP function • linear transforms • non-linearity • channel-to-space reshape

8.Ablation Analysis #parameters FLOPs

9.Implementation (pure PyTorch APIs for illustration usage) A customized CUDA kernel is expected in favor of memory and speed.

10.Relation to Self-Attention • Multi-Head Self-Attention operation: • query position: (i, j), key position: (p, q) • query, key, value: • #heads: H • yet another instantiation of involution: • kernel generated from a patch of pixels with dense correspondence

11.Comparison to Self-Attention • similarities: • head of self-attention: H <-> group of involution: G • affinity matrix of self-attention: <-> kernel of involution: • differences: • w/o pixel-to-pixel relation -> a large and dynamic kernel is sufficient • w/o position encoding -> the output of kernel generation function is ordered Our involution is a more compact and tidy neural operator!

12. Image Classification RedNet achieves the optimal pareto frontier regarding the accuracy-efficiency tradeoffs, compared to both baseline ResNet and self-attention based models

13.High-compute Regime vs. SOTA Transformers ü long period (300 epochs) ü RandAug ü Mixup/CutMix ü Label Smoothing ü Stochastic depth ü …

14.Object Detection and Instance Segmentation ü involution demonstrates its effectiveness in different components of a detector ü the performance enhancement for large objects is the most significant ü fully involution-based detectors are highly efficient (~40% computational cost)

15. Semantic Segmentation ü involution also demonstrates its effectiveness in different components of a segmentor ü the performance margins for large objects are also the most significant (up to 10~20%) ü possibly more efficient than self-attention based segmentation networks

16. Visualization of involution kernels ü position-aware: automatically pay attention to crucial parts in the spatial range ü semantic-discriminative: encode different semantic information in different groups

17.Summary • A brave new neural operator, more effective and efficient than convolution, simpler than self-attention • The design of involution bridges convolution and self-attention • Fully open-source project: https://github.com/d-li14/involution • All source code/pre-trained models for all the considered tasks

18. Thanks! Code: https://github.com/d-li14/involution Homepage: https://duoli.org

19.

0 点赞
0 收藏
0下载