- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
深度视觉图匹配的鲁棒性探索及数据中心视角下的展望
从图像领域到图领域,深度神经模型的鲁棒性已经成为一个重要议题,对抗噪声的广泛存在使得模型在实际部署时面临恶意攻击的威胁。然而作为图像和图的交叉领域,深度视觉图匹配的鲁棒性研究仍属欠缺。我们首先设计了对于图中关键点位置和隐含图结构的对抗攻击,大为降低了深度模型在部署时的表现。进一步的我们分析了我们设计的对抗攻击的模式,设计了外观感知正则器,可以识别并且显示地扩大易混淆的关键点在隐空间内的距离。广泛的深入的实验结果证明了我们方法的有效性。此外,我们的位置攻击可以作为数据增广的方式甚至进一步提高了最优模型的匹配精度,由此我将从数据中心的视角,讨论目前的方法论可能存在的拓展方向,以及对其他应用领域的适用潜力,作为未来的展望。
相关代码已开源在 https://github.com/Thinklab-SJTU/robustMatch
任麒冰,上海交通大学人工智能研究院研究生,指导老师是严骏驰副教授。主要研究兴趣集中于机器学习鲁棒性和数据隐私保护。目前以一作身份发表CVPR1篇,曾获得国家奖学金、上海市奖学金以及校A奖学金。
展开查看详情
1 .深度视觉图匹配的鲁棒性探索 及数据中心视角下的展望 任麒冰 上海交通大学 2022年5月11日
2 . Background: Visual graph matching Visual graph matching finds correspondence in images by graph matching. Figure credit to Jiaxin Lu, SJTU
3 . Background: Deep visual graph matching ❏ Current state-of-the-art[1][2] Keypoint Graph Matching Graph construct feature extractor solver VGG16 with Delaunay GNN solver or SplineConv triangulation black box solver Images with Keypoints Correspondence [1] Runzhong Wang, J. Yan, X. Yang. “Neural Graph Matching [2] Rolínek et al., “Deep Graph Matching Network: Learning Lawler Quadratic Assignment Problem via Blackbox Differentiation of with Extensions to Multi-Graph Matching and Hyper-Graph Combinatorial Solvers.”ECCV 2020. Matching Learning." TPAMI 2021. Figure credit to Runzhong Wang, SJTU
4 . Background: Adversary threat model ❏Adversary goal: evasion attack • Natural risk with Standard training (ST): 𝑅𝑛𝑎𝑡 𝑓, 𝐷 = 𝔼 𝑥,𝑦 ~𝐷 [𝐿(𝑓 𝑥 , 𝑦)] • Evasion attack at testing time: 𝑚𝑎𝑥 𝑡𝑒𝑠𝑡 𝑅𝑛𝑎𝑡 𝑓𝐷𝑡𝑟𝑎𝑖𝑛 , 𝐷 𝑡𝑒𝑠𝑡 ∈ ℬ(𝐷𝑡𝑒𝑠𝑡 ,𝜖) 𝐷 𝑠. 𝑡. 𝑓𝐷𝑡𝑟𝑎𝑖𝑛 = 𝑎𝑟𝑔 𝑚𝑖𝑛 𝑅𝑛𝑎𝑡 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 ❏Adversary capabilities: 𝑓 • Similarity metric: ℬ 𝐷, 𝜖 = {𝑥 ′ : 𝑑(𝑥, 𝑥′) ≤ 𝜖, ∀𝑥 ∈ 𝐷} • Common practice for vision data: On image pixels: 𝑑 𝑥, 𝑥 ′ =∥ 𝑥 − 𝑥 ′ ∥𝑝 • Common practice for graph data: Node injection, edge manipulation (addition or deletion), etc.
5 . Background: Adversary threat model ❏ Adversary knowledge: white-box attack • Fast Gradient Signed Method (FGSM): 𝑥 ′ = 𝑥 + 𝛼 ⋅ sign(𝛻𝑥 𝐿(𝑓 𝑥 , 𝑦)) 𝑠. 𝑡. 𝑥′ ∈ ℬ 𝑥, 𝜖 ′ • Projected Gradient Descent (PGD): 𝑥𝑡+1 = 𝛱𝜖 (𝑥𝑡′ +𝛼 ⋅ sign(𝛻𝑥𝑡′ 𝐿(𝑓 𝑥𝑡′ , 𝑦)) Left: natural image. Middle: adversarial perturbation found by PGD attack. Right: adversarial example. Trajectory visualization of PGD attack on loss surface Image from https://towardsdatascience.com/know-your-enemy-7f7c5038bdf3
6 . Background: Adversarial defense ❏ Proactive defense: • Adversarial risk by Adversarial training (AT): 𝑅𝑎𝑑𝑣 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 = 𝔼 𝑥,𝑦 ~𝐷𝑡𝑟𝑎𝑖𝑛 [max𝑥 ′ ∈ℬ 𝑥,𝜖 𝐿(𝑓 𝑥′ , 𝑦)] • Adversarial risk by TRADES: 𝑅𝑎𝑑𝑣 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 = 𝔼 𝑥,𝑦 ~𝐷𝑡𝑟𝑎𝑖𝑛 𝐿 𝑓 𝑥 , 𝑦 + max𝑥′ ∈ℬ 𝑥,𝜖 𝐾𝐿(𝑓 𝑥 ′ , 𝑓(𝑥) /𝜆] ❏A closer look at the decision boundary (DB): • Current limitations: 1. robustness-accuracy trade-off 2. robust generalization Left. A set of separatable points. Middle. DB under ST. Right. DB under AT. 3. robustness overestimation
7 . Vulnerabilities of deep visual GM ❏ Challenges: edge manipulation and node injection attack are NOT feasible. • Our solution: attack the hidden graph structure G via perturbing keypoint locality z. • The attack objective: ′ ′ ′ max ′ ′ max ′ L(f(c , z , G ), y) c ,z G 𝑠. 𝑡. d∞ 𝑐 ′ , 𝑐 ≤ 𝜖𝑐 , d∞ 𝑧 ′ , 𝑧 ≤ 𝜖𝑧 Pixel attack (epsilon= 8/255) Attack direction Newly added edge Deleted edge Cat: 9 / 11 locality attack (epsilon= 8) Cat: 2 / 11
8 . Towards robustness of deep visual GM ❏Challenges: defenses on single graph are NOT feasible for two (multi)-GM. • Key observation: appearance-similar keypoints are easily mismatched with each other. Figure. Visualizations of matching result (before) after being attacked in Before attack After attack sample-level. Figure. Visualizations of matching result (before) after being attacked in statistic-level.
9 . Appearance aware regularizer (AAR) ❏Key insight: appearance-similar keypoints can be discovered by attack priors. • The working pipeline: Hungarian Attack Step 4 Step 1 AAR matrix appearance aware matrix appearance-similar group p1 a Step 4 Step 3 Step 2 b c p2 d e p3
10 . Proposed framework: ASAR-GM ❏ASAR-GM: our proposed AAR is orthogonal to adversarial training. • Min-max optimization framework: ′ ′ ′ Pixel attack (epsilon= 8/255) min L Pixel f cattack , z(epsilon= y + 𝛽 ∗ AAR(f c ′ , z ′ , G′ , y) , G ,8/255) 𝜃 Pixel attack (epsilon= 8/255) 𝑠. 𝑡. c ′ , z ′ , G′ = arg max ′ ′ max ′ L(f(c ′ , z ′ , G′ ), y) c ,z G • Burn-in period for a better trade-off between accuracy and robustness. Pixel attack (epsilon= 8/255) Appearance Aware Regularizer Pixel attack (epsilon= 8/255) Pixel attack (epsilon= 8/255) Pixel attack (epsilon= 8/255) feature � � extractor Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 55) Pixel attack (epsilon= Cat: 9 / 8/255) 11 locality attack (epsilon= 4) Cat: 2 / 11 Pixel attack (epsilon= 8/255) affinity doubly- stochastic matrix ground-truth matching learning Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 loss 55) Cat: 9 / 11 locality attack (epsilon= 4) Cat: 92 // 11 Cat: 11 locality attack (epsilon= 4) Cat: 2 / 11 Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 correspondence solver localityCat: attack Cat: 2 / 11 Deep GM cross entropy 4) 2 / (epsilon= 11 4) Adversarial example generation locality attack (epsilon= 4) Cat: 2 / 11
11 . Experiments on Pascal VOC Datasets ❑ Evaluation: • clean accuracy: evaluation on clean test-set w/o being attacked. • robust accuracy: evaluation on the worst-case test-set being attacked. ❑ Attack Baselines: • White-box Attack: pixel, our locality, and combo attack with varying attack iterations. • Black-box Attack: query-based square attack and transfer-based MI-FGSM attack. • Adaptive Attack: generate adversarial examples via maximizing our defense loss. ❑ Defense Baselines: • Different GM Standard training baselines: PCA-GM, CIE-H, BBGM, etc. • Adversarial training with variants of inner maximization. Table. White-box robust accuracy (%) under various attacks. Obfuscated gradients!
12 . Experiments on Pascal VOC Datasets Table. Black-box robust accuracy (%) under various attack Obfuscated gradients! Table. White-box robust accuracy (%) under various attacks for ablation study. Results: 1) our AAR brings 2% higher acc and 7% higher robo over AT. 2) our locality attack, as a data augmentation, make a new state-of-the-art, 81.82% acc. 3) our locality attack is much stronger than vanilla pixel attack on Pixel AT with a 17.32% acc drop. 4) ASAR-GM (ours) outperforms baselines in all cases with averagely 25.8% impv.
13 . Experiments on Pascal VOC Datasets ❏ Visualizations of more matching results. Figure. Visualizations of the matching result of the baseline and our robust model under attacks.
14 . A data-centric view towards robustness ❑ Data quality matters for robustness: • Human annotated keypoint locality is sub-optimal. Our locality attack induces better discriminative features via perturbing locality. • Graphs constructed by Delaunay triangulation is vulnerable to small noise. Our locality attack improves graph structure diversity via perturbing locality. ❑ Good priors for graph construction: • Delaunay triangulation delivers good locality. Locality helps GM solver aggregate neighbor bias for feature updating. • Approximating the isomorphic topology structure may be the next step. ❑ From coarse to fine-grained graph matching: • Current methods overlooks intra-graph keypoint intersections. Each keypoint has its unique semantic label. • Our attack reveals semantic similarity by identifying appearance-similar keypoint groups. Thanks for listening!