Transferable adversarial attack method based on attention mechanism and multi-model integration optimization = 주의 메커니즘과 다중 모델 통합 최적화를 기반으로 한 전이 가능한 적대적 공격 방법|RISS 상세보기

다국어 초록 (Multilingual Abstract)

The susceptibility of deep neural networks (DNNs) to adversarial manipulation has been extensively verified in numerous studies. Within black-box attack settings, since the internal parameters of the target model are inaccessible, adversaries typically rely on surrogate models to approximate the decision boundary and subsequently craft adversarial inputs. Nevertheless, the reliance on a single surrogate model often leads to local optima, thereby weakening the cross-model transferability of the crafted adversarial examples. To address this issue, we introduce an adversarial example generation framework termed Attention-based Multi-model feature integration (AMA). Our approach leverages attention weights obtained from intermediate feature representations of surrogate models, integrates attention maps from multiple models to identify common discriminative features, and then applies an optimization strategy to perturb these features, thus overcoming the limitations inherent in single-model surrogates. Experimental evaluations demonstrate that AMA achieves up to 17.4% improvement over the baseline, with an average attack success rate of 58.0% against diverse defense models—surpassing the strongest baseline by 5.1%. These results highlight the effectiveness of our method in enhancing both adversarial strength and transferability.

번역하기

국문 초록 (Abstract)

딥 신경망(DNN)의 적대적 조작에 대한 공격은 수많은 연구를 통해 광범위하게 검증되었습니다. 블랙박스 공격 환경에서 대상 모델의 내부 파라미터에 접근할 수 없기 때문에, 공격자는 일반적으로 대용 모델을 활용해 결정 경계를 근사화한 후 적대적 입력을 생성합니다. 그러나 단일 대용 모델에 의존하는 것은 종종 국소 최적점에 빠지게 되어, 생성된 적대적 예제의 모델 간 전이성을 약화시킵니다. 이 문제를 해결하기 위해 우리는 '주의 기반 다중 모델 특징 통합(AMA)'이라는 적대적 예제 생성 프레임워크를 제안합니다. 우리 접근 방식은 대용 모델의 중간 특징 표현에서 얻은 주의 가중치를 활용해 다중 모델의 주의 맵을 통합하여 공통적인 구분 특징을 식별한 후, 최적화 전략을 적용해 이러한 특징을 변형함으로써 단일 모델 대용 모델의 한계를 극복합니다. 실험 평가 결과, AMA는 기준 모델 대비 최대 17.4%의 성능 개선을 달성했으며, 다양한 방어 모델에 대한 평균 공격 성공률은 58.0%로, 가장 강력한 기준 모델보다 5.1% 높은 성과를 보였습니다. 이 결과는 우리 방법이 적대적 강도와 전이성을 모두 향상시키는 데 효과적임을 강조합니다.

번역하기

딥 신경망(DNN)의 적대적 조작에 대한 공격은 수많은 연구를 통해 광범위하게 검증되었습니다. 블랙박스 공격 환경에서 대상 모델의 내부 파라미터에 접근할 수 없기 때문에, 공격자는 일반...

목차 (Table of Contents)

List of Contents
List of Contents I
List of figure V
List of table VII
AcknowledgementVIII

List of Contents
List of Contents I
List of figure V
List of table VII
AcknowledgementVIII
1. Introduction1
1.1 Background and Motivation 1
1.2 Objectives and Contributions 2
1.3 Paper Organization 5
2. Related work 7
2.1 Deep Neural Networks 7
2.2 Adversarial Attack 9
2.2.1 Definition and Threat Models9
2.2.2 Physical World Attacks11
2.3 Performance Metrics 11
2.3.1 Adversarial samples and benign samples11
2.3.2 Distance metrics 12
2.3.3 Transferability 13
2.3.4 Adversarial robustness 13
2.4 White-box Attack 14
2.4.1 Fast Gradient Sign Method (FGSM) 15
2.4.2 Basic Iteration Method (BIM) 16
2.4.3 Projected Gradient Descent (PGD) 17
2.5 Black-box Attack 18
2.5.1 Query-Based Attacks 19
2.5.2 Transfer-Based Attacks 19
2.5.3 Momentum Iterative Fast Gradient Sign Method(MI-FGSM) 20
2.5.4 Diverse Input Fast Gradient Sign Method (DI-FGSM) 21
2.5.5 Translation-Invariant Fast Gradient Sign Method(TI-FGSM) 22
2.5.6 Feature Disruption Attack (FDA)23
2.5.7 Feature Importance-aware Attack (FIA) 24
2.6 Overview of Adversarial Defense Strategies 26
2.6.1 Active Defense: Adversarial Training 27
2.6.2 Passive Defense: Denoising and Input Preprocessing 29
2.7 Attention Mechanism Overview 31
2.7.1 Attention in Computer Vision 31
2.7.2 Attention in Adversarial Attacks 32
2.8 Overview of Multi-Model Integration 33
2.8.1 Ensemble Methods in Machine Learning 33
2.8.2 Ensemble in Adversarial Context 34
3. Research Methodology 36
3.1 Preparation 36
3.1.1 Problem Formulation 36
3.1.2 Threat Model and Assumptions 37
3.2 Attention weight extraction38
3.2.1 Feature Map Selection 38
3.2.2 Gradient-based Attention Calculation39
3.3 Multi-model feature integration 40
3.3.1 Weighted Feature Aggregation 40
3.3.2 Spatial Attention Map Generation 41
3.4 Key feature destruction 43
3.4.1 Combined Loss Function Design43
3.4.2 Optimization with PGD 44
3.5 Analysis and Selection of Attention Ensemble Strategies 46
3.5.1 Strategy Comparison 46
3.5.2 Justification for Sum Strategy 47
4. Experiment 49
4.1 Experimental setup 49
4.1.1 Dataset49
4.1.2 Evaluation Metrics 50
4.1.3 Baseline Methods 50
4.1.4 Target Model51
4.1.5 Parameters52
4.2 Analysis of Experimental Results 54
4.2.1 Visual Quality of Adversarial Examples 54
4.2.2 White-box vs. Black-box Performance55
4.2.3 Performance against Defense Models56
4.3 Attention Layer Experiment58
4.3.1 Layer Selection Analysis 58
4.3.2 Trade-off between Attack Strength and Transferability . 59
4.4 Integration experiments with different models 60
4.4.1 Model Combination Strategies 60
4.4.2 Scalability Analysis61
4.5 Hyperparameters 62
4.5.1 Learning Rate and Iteration Analysis 62
4.5.2 Dynamic Learning Rate Strategy64
4.6 Experimental ablation of attention integration strategy 65
4.6.1 Strategy Performance Comparison 65
4.6.2 Robustness Analysis 66
4.7 Analysis of the superiority of the AMA method 67
5. Conclusion 69
5.1 Theoretical contributions and technical limitations 69
5.2 Future extension 69
References 71
Appendix 75
A. The intuition behind transportability 75
B. substitute model training 77
ABSTRACT 81

상세검색

RISS 보유자료

상세검색

해외전자자료

Transferable adversarial attack method based on attention mechanism and multi-model integration optimization = 주의 메커니즘과 다중 모델 통합 최적화를 기반으로 한 전이 가능한 적대적 공격 방법

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료