RISS 검색 - 국내학술지논문 상세보기

부가정보

다국어 초록 (Multilingual Abstract)

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.

참고문헌 (Reference)

1 "https://github.com/hill-a/stable-baselines/pull/453"

2 "https://github.com/hill-a/stable-baselines"

3 "https://en.wikipedia.org/wiki/Snake_(video_game_genre)"

4 Y.-C. Wu, "TAM: Using trainable-action-mask to improve sample-efficiency in reinforcement learning for dialogue systems" 1-8, 2019

5 "Source code"

6 R. Sutton, "Reinforcement Learning: An Introduction" MIT Press 2018

7 J. Schulman, "Proximal policy optimization algorithms"

8 "Modified code"

9 T. Zahavy, "Learn what not to learn: Action elimination with deep reinforcement learning" 3562-3573, 2018

10 V. Mnih, "Human-level control through deep reinforcement learning" 518 : 529-533, 2015

1 "https://github.com/hill-a/stable-baselines/pull/453"

2 "https://github.com/hill-a/stable-baselines"

3 "https://en.wikipedia.org/wiki/Snake_(video_game_genre)"

4 Y.-C. Wu, "TAM: Using trainable-action-mask to improve sample-efficiency in reinforcement learning for dialogue systems" 1-8, 2019

5 "Source code"

6 R. Sutton, "Reinforcement Learning: An Introduction" MIT Press 2018

7 J. Schulman, "Proximal policy optimization algorithms"

8 "Modified code"

9 T. Zahavy, "Learn what not to learn: Action elimination with deep reinforcement learning" 3562-3573, 2018

10 V. Mnih, "Human-level control through deep reinforcement learning" 518 : 529-533, 2015

11 OpenAI Ltd., "Gym toolkit software"

12 X. Gao, "Deep reinforcement learning for time series: playing idealized trading games"

동일학술지(권/호) 다른 논문

An improved backtracking search optimization algorithm for cubic metric reduction of OFDM signals
- 한국통신학회
- Hojjat Emami
- 2020
- KCI등재,SCOPUS
Deep Learning for Radio Propagation: Using Image-Driven Regression to estimate path loss in urban areas
- 한국통신학회
- Sotirios P. Sotiroudis
- 2020
- KCI등재,SCOPUS
Supervised ECG wave segmentation using convolutional LSTM
- 한국통신학회
- Aman Malali
- 2020
- KCI등재,SCOPUS
Hockey activity recognition using pre-trained deep learning model
- 한국통신학회
- Keerthana Rangasamy
- 2020
- KCI등재,SCOPUS

동일학술지 더보기

분석정보

View

상세정보조회

Usage

원문다운로드

대출신청

복사신청

EDDS신청

동일 주제 내 활용도 TOP

주제

연도별 연구동향

연도별 활용동향

연관논문

연구자 네트워크맵

공동연구자 (7)

유사연구자 (20) 활용도상위20명

인용정보 인용지수 설명보기

학술지 이력

학술지 이력
연월일	이력구분	이력상세
2023	평가예정	해외DB학술지평가 신청대상 (해외등재 학술지 평가)
2020-01-01	평가	등재학술지 유지 (해외등재 학술지 평가)
2017-08-01	평가	SCOPUS 등재 (기타)
2017-01-01	평가	등재후보학술지 선정 (신규평가)

상세검색

RISS 보유자료

상세검색

해외전자자료

Implementing Action Mask in Proximal Policy Optimization (PPO) Algorithm

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료