- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
1 . Huiqi Deng∗, Qihan Ren∗, Hao Zhang, Quanshi Zhang† Shanghai Jiao Tong University * Equal contribution. † corresponding author.
2 . Background and Motivation Traditional models Deep neural networks (DNNs) Performance DNNs Traditional model Amount of data Why superior performance? Representation capacities A common hypothesis
3 . Background and Motivation • How to study the representation capacities of DNNs ? parameter complexity overfitting Representation capacity Generalization ability DNNs Adversarial + = robustness Panda Gibbon ✓ Reflect the representation capacity from a certain perspective ......
4 . Motivation • Unlike previous studies, we focus on the following questions ⚫ Any common tendencies of DNNs in representing concepts, i.e., which types of concepts are (un)likely to be encoded in DNNs? ⚫ Does a DNN encode similar visual concepts to human beings for image classification?
5 . Conclusions Types of concepts ? Complexity of interactions ⚫ Any common tendencies of DNNs in representing concepts, i.e., which types of concepts are (un)likely to be encoded in DNNs? ✓ Simple interactions × Middle-complex interactions ✓ Complex interactions ⚫ Does a DNN encode similar visual concepts to human beings for image classification? visual concepts encoded in a DNN ≠ human beings
6 . Interactions • Interactions and interaction concepts independently DNN inference Intetaction utility interact 0.1 trigger the importance of i and changed by j variable i variables j Interaction concept (head) The inference of DNN: × considers input variables working independently. ✓ encodes the interaction between input variables to form an interaction concept for inference.
7 . Multi-order interactions • Complexity of interaction concepts variable i variable i interact interact variable j variable j A few large amounts of contexts S contexts S Simple interactions Complex interactions • Multi-order interactions to represent complexity The importance of i changed by j the importance of i the importance of i (when j is present) (when j is absent) Multi-order interaction order
8 . Multi-order interactions • Complexity of interaction concepts variable i variable i interact interact variable j variable j A few large amounts contexts S of contexts S Simple interactions Complex interactions • Multi-order interactions to represent complexity A simple collaboration A small m (low-order) between a few variables A complex collaboration A larger m (high-order) between massive variables
9 . Output vs Multi-order interactions • Efficiency axiom of multi-order interactions Efficiency axiom. the network output can be decomposed into utilities of multi-order interactions of differnent orders (i.e., interaction concepts of different complexities). Output independent utilities Overall Utilities of multi-order intearctions small m large m medium m low-order middle-order high-order (simple) (middle-complex) (complex) ➢ Therefore, interaction concepts can be exactly categorized into concepts of low-order (simple), middle-order (middle-complex), and high-order (complex).
10 . Discovering the bottleneck • The relative interaction strength of the m-th order Representation bottleneck: A DNN usually encodes strong low-order and strong high-order interactions, but encodes weak middle-order interactions. 𝑱(𝒎) • The representation bottleneck phenomenon is widely shared by different DNN architectures trained on different datasets.
11 . Bottleneck → Cognition gap Cognition gap. i.e., DNNs and humans encode different types of interaction patterns for inference. context: a few patches context: middle number context: massive patches of patches • Whether humans/DNNs can extract new information from additional new patches under contexts of different sizes.
12 . Explaining the bottleneck • Proof: the change of network weights can be decomposed into the sum of gradients of multi-order interactions w.r.t. weights. the strength of learning the m-order interactions. much higher when the order m is small or large Training strength Bottleneck much lower when the order m is medium
13 . Explaining the bottleneck • Verification of the theory Simulate Theoretical training strength Empirical interaction strength (in Theorem 1) (in real applications) Simulations of the distributions based on curves on ImageNet.
14 . Train DNNs encoding specific orders of interactions • Can we force the DNN to encode interactions of specific orders ? We prove that the mainly encodes interactions of orders.
15 . Train DNNs encoding specific orders of interactions Encourage specific orders of interactions: Penalize specific orders of interactions: Total loss: In experiments, we found that the two losses usually could encourage/penalize interactions of the -th orders.
16 . Investigating representation capacities We investigate the representation capacities of four types of DNNs • Normal DNN: normally trained DNN • Low-order DNN: penalize high-order interactions. • Middle-order DNN: encourage middle-order interactions. • High-order DNN: penalize low-order interactions.
17 . Investigating representation capacities Part I: Classification accuracy ➢ The four types of DNNs achieved similar accuracies ➢ Middle-order interactions can also provide discriminative information 17
18 . Investigating representation capacities Part II: Adversarial robustness ➢ High-order interactions are vulnerable to adversarial attacks ➢ Low-order interactions are more robust to adversarial attacks Low-order Middle-order High-order 18
19 . Investigating representation capacities Part III: Bag-of-words vs. structural representations. random masking masked surrounding masking masked ➢ The high-order DNN encodes more structural information. High-order DNN High-order DNN 19
20 . Investigating representation capacities Part III: Bag-of-words vs. structural representations. random masking masked surrounding masking masked ➢ The low-order DNN prefers bag-of-words representation. 20
21 . Conclusions • Discover a representation bottleneck phenomenon of DNNs. • Theoretically explain the representation bottleneck. • Propose losses to force DNNs to encode interactions of specific orders. • Investigate the representation capacities of low-order, middle-order, and high-order DNNs. 21