RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        게임과 로봇공학에서의 모델 프리 강화학습 응용에 대한 사례 조사

        김세원,이재길 한국정보과학회 2019 데이타베이스 연구 Vol.35 No.2

        Reinforcement learning is the learning process to enable an agent to understand the current environment and get feedback from the action. Though reinforcement learning has been actively studied in various principles, games and robotics are known to be especially well-suited to a Markov decision process, making them easier to apply reinforcement learning. The types of model-free reinforcement learning include Monte-Carlo control, SARSA, Q-learning, and policy gradient and use the appropriate methods depending on the problem situation. Model-free reinforcement learning algorithms such as deep Q-learning and policy gradient are mainly used to solve the problems related to games and robotics. However, reinforcement learning does not work well in some environments where the reward is delayed or has only insufficient information. We believe that future studies need to address these limitations. We plan to investigate the state-of-the-art of reinforcement learning on game and robotics. 강화학습은 에이전트가 환경으로부터 현재의 상태를 인지하고 수행한 행동에 대한 피드백을 받으며 학습을 진행한다. 강화학습은 여러 응용에서 활발히 연구되고 있지만, 특히 게임과 로봇에 대한 문제는 마르코프 결정 과정으로 쉽게 표현할 수 있어 강화학습을 적용하기에 용이하다. 모델 프리 강화학습의 종류는 몬테 카를로 컨트롤, 살사, 큐러닝, 정책 경사 방법 등이 있으며, 문제 상황에 따라 알맞은 방법을 사용한다. 게임과 로봇과 관련한 문제를 풀기 위해 모델 프리 강화학습 알고리즘이 주로 사용되며, 딥 큐러닝과 정책 경사 방법이 대표적으로 사용되어왔다. 하지만 주어진 환경에 대한 정보가 충분하지 않아서 보상과 관련된 정보가 충분하지 않고, 보상이 지연되는 경우에는 강화학습이 제대로 작동하기 어려운 문제가 있다. 향후 연구에서는 이러한 단점을 보완하여야 할 것이다. 또한, 이번 연구에서 다루지 못한 게임과 로봇에 관련된 State-of-the-art를 향후 연구에서 다룰 것이다.

      • KCI등재

        실용적 강화학습 기술 동향: 모방학습부터 오프라인 강화학습까지

        이동수,엄찬인,최성우,김성관,권민혜 한국통신학회 2023 韓國通信學會論文誌 Vol.48 No.11

        The reinforcement learning paradigm has shifted from online to offline recently. Such a change is to overcome the impracticality of online reinforcement learning, which is limited to simulation-based game tasks (e.g., Go, Chess, Atari, and so on). This paper reviews an offline reinforcement learning approach that builds a policy by leveraging previously collected fixed datasets. To elaborate, we deal with the state-of-the-art offline reinforcement learning algorithms, which have been proposed to mitigate the distributional shift. Lastly, we discuss the open problems and limitations of current offline reinforcement learning.

      • KCI등재

        A Study on the Implementation of Crawling Robot using Q-Learning

        김현기,김경아,강민수 한국인공지능학회 2023 인공지능연구 (KJAI) Vol.11 No.4

        Machine learning is comprised of supervised learning, unsupervised learning and reinforcement learning as the type of data and processing mechanism. In this paper, as input and output are unclear and it is difficult to apply the concrete modeling mathematically , reinforcement learning method are applied for crawling robot in this paper. Especially, Q-Learning is the most effective learning technique in model free reinforcement learning. This paper presents a method to implement a crawling robot that is operated by finding the most optimal crawling method through trial and error in a dynamic environment using a Q-learning algorithm. The goal is to perform reinforcement learning to find the optimal two motor angle for the best performance, and finally to maintain the most mature and stable motion about EV3 Crawling robot. In this paper, for the production of the crawling robot, it was produced using Lego Mindstorms with two motors, an ultrasonic sensor, a brick and switches ,and EV3 Classroom SW are used for this implementation. By repeating 3 times learning, total 60 data are acquired, and two motor angles vs. crawling distance graph are plotted for the more understanding. Applying the Q-learning reinforcement learning algorithm, it was confirmed that the crawling robot found the optimal motor angle and operated with trained learning, and learn to know the direction for the future research.

      • KCI등재

        AI기법의 Q-Learning을 이용한 최적 퇴선 경로 산출 연구

        김원욱,김대희,윤대근 해양환경안전학회 2018 해양환경안전학회지 Vol.24 No.7

        In the worst maritime accidents, people should abandon ship, but ship structures are narrow and complex and operation takes place on rough seas, so escape is not easy. In particular, passengers on cruise ships are untrained and varied, making evacuation prospects worse. In such a case, the evacuation management of the crew plays a very important role. If a rescuer enters a ship at distress and conducts rescue activities, which zones represent the most effective entry should be examined. Generally, crew and rescuers take the shortest route, but if an accident occurs along the shortest route, it is necessary to select the second-best alternative. To solve this situation, this study aims to calculate evacuation routes using Q-Learning of Reinforcement Learning, which is a machine learning technique. Reinforcement learning is one of the most important functions of artificial intelligence and is currently used in many fields. Most evacuation analysis programs developed so far use the shortest path search method. For this reason, this study explored optimal paths using reinforcement learning. In the future, machine learning techniques will be applicable to various marine-related industries for such purposes as the selection of optimal routes for autonomous vessels and risk avoidance. 선박은 해양사고 발생 시 최악의 경우 퇴선을 해야 하나 특성상 협소하고 복잡하며 해상에서 운항하므로 퇴선이 쉽지 않다. 특히, 여객선의 경우 해상에서의 안전훈련을 이수하지 않은 불특정 다수의 승객들로 인해 더욱 퇴선이 어려운 상황이 된다. 이런 경우 승무원들의 피난 유도가 상당히 중요한 역할을 하게 된다. 그리고 구조자가 사고 선박에 진입하여 구조 활동을 하는 경우 어느 구역으로 진입해야 가장 효과적인지에 대한 검토가 필요하다. 일반적으로 승무원 및 구조자는 최단경로를 택하여 이동하는 것이 일반적이나 최단 경로에 사고 상황 등이 발생했을 경우 제2의 최적 경로 선택이 필요하다. 이러한 상황을 해결하기 위해 이 연구에서는 머신러닝(Machine learning)의 기법 중에 하나인 강화학습(Reinforcement Learning)의 Q-Learning 이용하여 퇴선 경로를 산출하고자 한다. 강화학습은 인공지능(Artificial Intelligence)의 가장 핵심적인 기능으로 현재 여러 분야에 사용되고 있다. 현재까지 개발된 대부분의 피난분석 프로그램은 최단 경로를 탐색하는 기법을 사용하고 있다. 이 연구에서는 최단경로가 아닌 최적경로를 분석하기 위해 머신러닝의 강화학습 기법을 이용하였다. 향후 AI기법인 머신러닝은 자율운항선박의 최적항로 선정 및 위험요소 회피 등 다양한 해양관련 산업에 적용 가능할 것이다.

      • KCI등재

        A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

        성연식,조경은 한국정보처리학회 2012 Journal of information processing systems Vol.8 No.3

        The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro-Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro-Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

      • 강화학습 알고리즘을 이용한 이동로봇의 행동형태 연구

        정영미(Y. M. Jeong ),정석권(S. K. Jeong) 한국동력기계공학회 2009 한국동력기계공학회 학술대회 논문집 Vol.2009 No.6

        The main advantage of reinforcement learning is to provide an unexpected solution for designer. In this paper, we investigate the motion forms of the mobile robot with two-dimension by using Q-Learning algorithm which is one of interesting reinforcement learning problems. Moreover, we clarify transit motion forms under learning process of a robot with two-dimensional mobile function and analysis the motion forms on the axis of x and y. The learning results show the possibility of obtaining the unexpected new learning motion forms and understanding the transit motion forms under learning process by the reinforcement learning.

      • Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning

        Jeong, Gyeeun,Kim, Ha Young Elsevier 2019 expert systems with applications Vol.117 No.-

        <P><B>Abstract</B></P> <P>We study trading systems using reinforcement learning with three newly proposed methods to maximize total profits and reflect real financial market situations while overcoming the limitations of financial data. First, we propose a trading system that can predict the number of shares to trade. Specifically, we design an automated system that predicts the number of shares by adding a deep neural network (DNN) regressor to a deep Q-network, thereby combining reinforcement learning and a DNN. Second, we study various action strategies that use Q-values to analyze which action strategies are beneficial for profits in a confused market. Finally, we propose transfer learning approaches to prevent overfitting from insufficient financial data. We use four different stock indices—the S&P500, KOSPI, HSI, and EuroStoxx50—to experimentally verify our proposed methods and then conduct extensive research. The proposed automated trading system, which enables us to predict the number of shares with the DNN regressor, increases total profits by four times in S&P500, five times in KOSPI, 12 times in HSI, and six times in EuroStoxx50 compared with the fixed-number trading system. When the market situation is confused, delaying the decision to buy or sell increases total profits by 18% in S&P500, 24% in KOSPI, and 49% in EuroStoxx50. Further, transfer learning increases total profits by twofold in S&P500, 3 times in KOSPI, twofold in HSI, and 2.5 times in EuroStoxx50. The trading system with all three proposed methods increases total profits by 13 times in S&P500, 24 times in KOSPI, 30 times in HSI, and 18 times in EuroStoxx50, outperforming the market and the reinforcement learning model.</P> <P><B>Highlights</B></P> <P> <UL> <LI> A financial trading system is proposed to improve traders’ profits. </LI> <LI> The system uses the number of shares, action strategies, and transfer learning. </LI> <LI> The number of shares is determined by using a DNN regressor. </LI> <LI> When confusion exists, postponing a financial decision is the best policy. </LI> <LI> Transfer learning can address problems of insufficient financial data. </LI> </UL> </P>

      • SCOPUSKCI등재

        A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

        Sung, Yun-Sick,Cho, Kyun-Geun Korea Information Processing Society 2012 Journal of information processing systems Vol.8 No.3

        The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro-Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro-Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

      • KCI등재

        에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계

        용성중,박효경,유연휘,문일영 한국실천공학교육학회 2022 실천공학교육논문지 Vol.14 No.1

        Q-Learning is a technique widely used as a basic algorithm for reinforcement learning. Q-Learning trains the agent in the direction of maximizing the reward through the greedy action that selects the largest value among the rewards of the actions that can betaken in the current state. In this paper, we studied a policy that can speed up agent training using Q-Learning in Frozen Lake 8×8grid environment. In addition, the training results of the existing algorithm of Q-learning and the algorithm that gave the attribute‘direction’ to agent movement were compared. As a result, it was analyzed that the Q-Learning policy proposed in this paper cansignificantly increase both the accuracy and training speed compared to the general algorithm. 강화학습의 기본적인 알고리즘으로 많이 사용되고 있는 Q-Learning은 현재 상태에서 취할 수 있는 행동의 보상 중 가장 큰 값을 선택하는 Greedy action을 통해 보상을 최대화하는 방향으로 에이전트를 학습시키는 기법이다. 본 논문에서는 Frozen Lake8*8 그리드 환경에서 Q-Learning을 사용하여 에이전트의 학습 속도를 높일 수 있는 정책에 관하여 연구하였다. 또한, Q-learning의 기존 알고리즘과 에이전트의 행동에 ‘방향성’이라는 속성을 부여한 알고리즘의 학습 결과 비교를 진행하였다. 결과적으로,본 논문에서 제안한 Q-Learning 정책이 통상적인 알고리즘보다 정확도와 학습 속도 모두 크게 높일 수 있는 것을 분석되었다.

      • KCI등재

        Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

        선우영민(Yung-Min Sunwoo),이원창(Won-Chang Lee) 한국전기전자학회 2021 전기전자학회논문지 Vol.25 No.4

        Self-Imitation Learning은 간단한 비활성 정책 actor-critic 알고리즘으로써 에이전트가 과거의 좋은 경험을 활용하여 최적의 정책을 찾을 수 있도록 해준다. 그리고 actor-critic 구조를 갖는 강화학습 알고리즘에 결합되어 다양한 환경들에서 알고리즘의 상당한 개선을 보여주었다. 하지만 Self-Imitation Learning이 강화학습에 큰 도움을 준다고 하더라도 그 적용 분야는 actor-critic architecture를 가지는 강화학습 알고리즘으로 제한되어 있다. 본 논문에서 Self-Imitation Learning의 알고리즘을 가치 기반 강화학습 알고리즘인 DQN에 적용하는 방법을 제안하고, Self-Imitation Learning이 적용된 DQN 알고리즘의 학습을 다양한 환경에서 진행한다. 아울러 그 결과를 기존의 결과와 비교함으로써 Self-Imitation Leaning이 DQN에도 적용될 수 있으며 DQN의 성능을 개선할 수 있음을 보인다. Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼