Environment-Agnostic Architecture for Heterogeneous Multi-Environment Reinforcement Learning = 이종 다중환경 강화학습을 위한 환경-범용적 아키텍처|RISS 상세보기

다국어 초록 (Multilingual Abstract)

In new environments, training a Reinforcement Learning (RL) agent from scratch can prove to be inefficient. The computational and temporal costs can be significantly reduced if the agent is capable of learning across diverse environments and effectively engaging in transfer learning. However, achieving learning across multiple environments is challenging due to the varying state and action spaces inherent in different RL problems. A naive parameter sharing with environment-specific layers for different state-action spaces does not effectively facilitate transfer learning. In this work, we present a flexible and environment-agnostic architecture designed to facilitate learning across multiple environments simultaneously, while enabling efficient transfer learning for new environments. We also develop training algorithms within the proposed architecture to facilitate both online and offline RL. Our experiments demonstrate that multi-environment training with one agent is possible in heterogeneous environments and parameter sharing is not effective in transfer learning.

번역하기

목차 (Table of Contents)

Table of Contents
Abstract i
국문초록 ii
Preface iii
Table of Contents iii

Table of Contents
Abstract i
국문초록 ii
Preface iii
Table of Contents iii
List of Tables vi
List of Figures vii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 4
2.1 Multi-Environment Reinforcement Learning . . . . . . . . . . . . . 4
2.2 Proximal Policy Optimization . . . . . . . . . . . . . . . . . . . . . 5
2.3 Implicit Q Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Structured State Space Sequence Model . . . . . . . . . . . . . . . 5
3 Methods 7
3.1 Environment-Agnostic Architecture for Heterogeneous Multi-Environment RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Arbitrary 1D Input-Output Agent . . . . . . . . . . . . . . 7
3.1.2 Decentralized Distributed Algorithm for Heterogeneous Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 DDPPO for Heterogeneous Environments . . . . . . . . . . 11
3.1.4 DDIQL for Heterogeneous Environments . . . . . . . . . . . 12
3.1.5 Stabilizing Multi-Objective Optimization . . . . . . . . . . 12
4 Experiments 13
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Online and Offline Multi-Environment Training . . . . . . . . . . 13
4.2.1 Online Multi-Environment Training . . . . . . . . . . . . . 13
4.2.2 Offline Multi-Environment Training . . . . . . . . . . . . . 16
4.3 Online Multi-Environment Pretraining and Transfer Learning . . 17
4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Conclusion 21
Reference 22
Appendix 28
A Additional Experiment Results 28
A.1 Classic-to-Mujoco . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A.2 Mujoco-to-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B Ablation Study 31
B.1 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.2 Scratch learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

상세검색

RISS 보유자료

상세검색

해외전자자료

Environment-Agnostic Architecture for Heterogeneous Multi-Environment Reinforcement Learning = 이종 다중환경 강화학습을 위한 환경-범용적 아키텍처

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료