Reinforcement learning-based TSP problem solving with diffusion model|RISS 상세보기

다국어 초록 (Multilingual Abstract)

The Traveling Salesman Problem (TSP) is a cornerstone challenge in combinatorial optimization, recognized as an NP-hard problem. The rapid evolution of advanced artificial intelligence technologies has inspired a myriad of novel strategies to tackle this enduring issue. In this paper, we introduce an innovative approach that synergizes reinforcement learning (RL) with diffusion models to solve the TSP. This method employs a latent variable to generate an adjacency matrix, thus bridging the methodologies of image and graph learning in a unified RL framework. Our approach leverages a pre-trained diffusion model, utilizing it as a source of prior knowledge, which enhances computational efficiency and fosters remarkable generalization capabilities. We present experimental results on TSP datasets comprising various numbers of cities, where our method consistently surpasses existing solutions. This achievement not only deepens our comprehension of complex AI technique integration but also establishes a new benchmark for efficiently resolving NP-hard problems through cutting-edge artificial intelligence.

번역하기

국문 초록 (Abstract)

외판원 문제(TSP)는 조합 최적화 분야의 주요 과제로, NP-완전 문제로 인정받고 있다. 첨단 인공지능 기술의 급속한 발전은 이 오랜 문제를 해결하기 위한 다양한 새로운 전략들을 고안해냈다. 본 논문에서는 강화학습(RL)과 확산 모델을 통합하여 TSP를 해결하는 혁신적인 접근 방식을 소개한다. 이 방법은 잠재 변수를 사용하여 인접 행렬을 생성함으로써 이미지 학습과 그래프 학습 방법론을 단일 RL 프레임워크에서 통합한다. 본 접근법은 사전 지식의 원천으로 활용되는 사전 훈련된 확산 모델(Diffusion Model)을 활용하여 계산 효율성을 높이고 뛰어난 일반화 능력을 갖추고 있다. 다양한 도시 수를 포함하는 TSP 데이터셋에 대한 실험 결과를 제시하며, 본 방법이 기존 솔루션을 꾸준히 능가함을 보여준다. 이러한 성과는 복잡한 AI 기술 통합에 대한 이해를 심화시킬 뿐만 아니라, 첨단 인공지능을 통해 NP-완전 문제를 효율적으로 해결하는 새로운 기준을 수립한다.

번역하기

외판원 문제(TSP)는 조합 최적화 분야의 주요 과제로, NP-완전 문제로 인정받고 있다. 첨단 인공지능 기술의 급속한 발전은 이 오랜 문제를 해결하기 위한 다양한 새로운 전략들을 고안해냈다....

목차 (Table of Contents)

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Traveling Salesman Problem (TSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Traditional Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Theoretical Foundations of Reinforcement Learning . . . . . . . . . . . . . 8
2.2.2 Advancements and Applications of Reinforcement Learning in Solving TSP 11
2.3 Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Theoretical Foundations of Diffusion Models . . . . . . . . . . . . . . . . . 13
2.3.2 Applications to TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 3: Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Multi-step MDP for Training Diffusion Model . . . . . . . . . . . . . . . . . . . . . 21
3.2 Policy Network as a Denoising Process . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Initialization Using Diffusion Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 4: Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Basic TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Small-scale Problems (N = 20) . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Medium-scale Problems (N = 50) . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.3 Large-scale Problems (N = 100 and N = 200) . . . . . . . . . . . . . . . . . . 32
4.1.4 Comparison with Other Methods . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.5 Robustness and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Inference Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 TSP with Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Results for N = 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Results for N = 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.3 Results for N = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Results for N = 200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
국문초록 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

상세검색

RISS 보유자료

상세검색

해외전자자료

Reinforcement learning-based TSP problem solving with diffusion model

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료