RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

오늘날 강화학습은 자율주행, 로봇, 게임 등 다양한 분야에서 연구 및 활용되고 있다. 강화학습은 에이전트가 환경과 상호작용하며 최적의 행동 정책을 찾는 것을 목표로 하며, 환경과 문제에 따라 정책 기반 알고리즘과 가치 기반 알고리즘 중 더 적절한 알고리즘이 선택되어 사용된다. 정책 기반 알고리즘은 연속적이고 고차원적인 행동 공간에서 효과적인 학습이 가능하지만, 학습률 파라미터가 학습에 미치는 영향이 크고, 복잡한 환경일수록 최적화된 정책의 수렴 난도가 상승하는 문제점이 존재한다. 본 논문에서는 이러한 문제점을 해결하고자 어닐링 알고리즘을 기반한 행동 선택 기법 및 동적 밀집 보상 설계를 제안한다. 제안된 방식을 두 가지 대표적인 정책 기반 알고리즘인 A2C 알고리즘과 PPO 알고리즘에 적용하여 실험을 진행하였고, 실험 결과, 제안된 방식을 적용한 두 강화학습 알고리즘이 기존 강화학습 알고리즘 대비, 더 높은 성능을 보였다.

번역하기

오늘날 강화학습은 자율주행, 로봇, 게임 등 다양한 분야에서 연구 및 활용되고 있다. 강화학습은 에이전트가 환경과 상호작용하며 최적의 행동 정책을 찾는 것을 목표로 하며, 환경과 문제...

다국어 초록 (Multilingual Abstract)

Nowadays, reinforcement learning is being studied and utilized in various fields, including autonomous driving, robotics, and gaming. The goal of reinforcement learning is to find the optimal policy for an agent to interact with its environment. Depending on the environment and the specific problem, either a policy-based algorithm or a value-based algorithm is selected for use. Policy-based algorithms can effectively learn in continuous and high-dimensional action spaces, but they face challenges such as the influence of learning rate parameters on the learning process and increased difficulty in converging to an optimized policy in complex environments. To address these issues, this paper proposes a behavior selection technique and a dynamic dense reward design based on a simulated annealing algorithm. The proposed method is applied to two different environments, and experimental results show that the policy-based reinforcement learning algorithms utilizing this method outperform the standard reinforcement learning algorithms.

번역하기

상세검색

RISS 보유자료

상세검색

해외전자자료

정책 기반 강화학습에서의 효율적 탐색을 위한 행동 선택 기법 및 동적 밀집 보상 적용 연구 = Research on Action Selection Techniques and Dynamic Dense Reward Application for Efficient Exploration in Policy-Based Reinforcement Learning

부가정보

동일학술지(권/호) 다른 논문

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료