RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

        선우영민(Yung-Min Sunwoo),이원창(Won-Chang Lee) 한국전기전자학회 2021 전기전자학회논문지 Vol.25 No.4

        Self-Imitation Learning은 간단한 비활성 정책 actor-critic 알고리즘으로써 에이전트가 과거의 좋은 경험을 활용하여 최적의 정책을 찾을 수 있도록 해준다. 그리고 actor-critic 구조를 갖는 강화학습 알고리즘에 결합되어 다양한 환경들에서 알고리즘의 상당한 개선을 보여주었다. 하지만 Self-Imitation Learning이 강화학습에 큰 도움을 준다고 하더라도 그 적용 분야는 actor-critic architecture를 가지는 강화학습 알고리즘으로 제한되어 있다. 본 논문에서 Self-Imitation Learning의 알고리즘을 가치 기반 강화학습 알고리즘인 DQN에 적용하는 방법을 제안하고, Self-Imitation Learning이 적용된 DQN 알고리즘의 학습을 다양한 환경에서 진행한다. 아울러 그 결과를 기존의 결과와 비교함으로써 Self-Imitation Leaning이 DQN에도 적용될 수 있으며 DQN의 성능을 개선할 수 있음을 보인다. Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

      • Discrete Task-Space Automatic Curriculum Learning for Robotic Grasping

        Anil Kurkcu,Cihan Acar,Domenico Campolo,Keng Peng Tee 제어로봇시스템학회 2021 제어로봇시스템학회 국제학술대회 논문집 Vol.2021 No.10

        Deep reinforcement learning algorithms struggle in the domain of robotics where data collection is time consuming and in some cases safety-constrained. For sample-efficiency, curriculum learning has shown good results in deep learning-based methods. However, the issue lies on the generation of the curriculum itself, which the field of automatic curriculum learning is trying to solve. We present an automatic curriculum learning algorithm for discrete task-space scenarios. Our curriculum generation is based on difficulty measure between tasks and learning progress metric within a task. We apply our algorithm to a grasp learning problem involving 49 diverse objects. Our results show that a policy trained based on a curriculum is both sample efficient compared to learning from scratch and able to learn tasks that the latter could not learn within a reasonable amount of time.

      • KCI우수등재

        [인공지능, 신경망 및 퍼지시스템] 특정 화자 대화생성을 위한 이중 정책망 강화학습

        이창환(Chang-Hwan Lee) 대한전자공학회 2019 전자공학회논문지 Vol.56 No.4

        심층 학습(deep learning)과 강화학습(reinforcement learning)을 결합한 심층강화학습(deep reinforcement learning)은 최근 다양한 분야에서 널리 사용되고 있으며 특히 챗봇의 개발에 있어서 중요한 역할을 하고 있다. 이러한 심층강화학습을 이용한 챗봇 개발에 있어서는 주로 심층 러닝으로 구현된 정책 네트워크(policy network)를 policy gradient 기법을 사용하여 자연어 생성 학습을 진행한다. 본 연구에서는 이러한 챗봇(chatbot)의 개발에 있어서 대화자의 특성에 따라 학습이 가능한 두 개의 정책 네트워크(dual policy network)를 사용하는 새로운 심층강화학습 방법을 제안한다. 실제 데이터를 이용한 다양한 대화 생성 실험을 진행하였으며 제안된 모델은 기존의 모델에 비하여 더욱 대화자의 특징에 맞는 대화를 생성 할 수 있음을 알 수 있었다. Developing chatbot system is a challenging problem in artificial intelligence and many people recently begin to use deep reinforcement learning in this area. In addition, policy network method has been widely used in deep reinforcement learning for chatbot. In this paper, we propose a new method of dual policy network for dialog generation in deep reinforcement learning. In dual policy network, one policy network is dedicated to a dialog policy for a specific person while the other is for other persons. The proposed model was tested using real dialog dataset and could generate more user-specific dialogue.

      • 구글의 심층강화학습 특허분석

        최연성 한국지식재산교육연구학회 2019 지식재산 교육과 연구 Vol.7 No.2

        신경망에 의한 기계학습은 전통적으로 지도학습, 비지도학습, 강화학습으로 구분한다. 강화학습은 심리학에서 일찍부터 연구되었으며, 인공지능에서도 이미 여러 연구가 있었다. 하지만 효율성이 좋지 못하여 문제해결에 적용되지 못하다가 최근 심층학습과 결합하여 그 성능이 획기적으로 향상되었다. 이것이 알파고 알고리즘이다. 알파고 이후로 강화학습은 큰 인기를 누리고 있으며, 많은 곳에서 사용되고 있다. 이 기술에 대한 특허는 구글의 자회사인 딥마인드가 출원하였으며 현재 구글이 소유한 AI의 핵심기술로 평가할 수 있다. 본 논문에서는 구글이 알파고에서 사용한 심층강화학습의 특허를 분석하고, 이를 바탕으로 원천기술의 중요성을 고찰한다. Machine learning by Neural networks is traditionally divided into supervised learning, unsupervised learning, and reinforcement learning. Reinforcement learning was studied early in psychology, and there have already been several studies in artificial intelligence. In spite of these efforts its efficiency was not good enough to be applied to the problem solving. However, its performance has improved dramatically since it was recently combined with deep learning. This is the AlphaGo algorithm. Since AlphaGo, reinforcement learning has been a huge hit and has been used in many places. Patents on the technology were filed by DeepMind, a subsidiary of Google, and can now be evaluated as a core technology of AI owned by Google. In this paper, the patent of deep reinforcement learning technology in AlphaGo is analyzed, and based on this, the importance of source technology is considered.

      • KCI등재

        확정적 네트워크에서의 동적 처리순위를 활용한 강화학습 기반 스케줄러

        류지혜,박규동,권주혁,정진우 한국통신학회 2023 韓國通信學會論文誌 Vol.48 No.4

        Smart industry, metaverse, digital-twin, and military applications require deterministic data delivery in large scale networks. This paper proposes reinforcement learning-based scheduling that assigns dynamically different precedences to the flows, in addition to the flow's class or priority, and determines the scheduling algorithm according to the flow's precedence. In the proposed reinforcement learning-based scheduling algorithm with two precedence queues, the reinforcement learning agent takes two actions that assigns the precedence of flows according to a specified criterion and selects a scheduling algorithm. Depending on the purpose of the network, any factor with high importance could be a criterion for determining the precedence. In this study, the deadline required by the flow is designated as the major factor for precedence decision. By utilizing DDQN (Double Deep Q-Network), a deep learning-based reinforcement learning model, the precedence and the scheduling algorithm are determined by observing the state of the network and selecting an action at each decision period with a fixed length. In the network simulator developed for the study, it was confirmed that the DDQN agent showed better performance than various heuristic algorithms. 스마트 인더스트리, 메타버스, 디지털 트윈, 군사용 어플리케이션 등에서 확정적 데이터 전달을 요구하고 있다. 본 논문은 일반적으로 통용되는 플로우들의 클래스 혹은 우선순위와는 별도로, 네트워크 상황과 중요도에 따라 플로우 별로 동적으로 처리순위(precedence)를 할당하고, 이에 따라 스케줄링 알고리즘을 결정하는 강화학습 기반의스케줄링 프레임워크를 제안한다. 이를 실증하기 위해서 두 개의 처리순위 큐가 존재하는 환경을 상정하여, 강화학습 에이전트가 지정된 기준에 따라 플로우들의 처리순위를 지정하며 스케줄링 알고리즘을 선택하는 두 가지의행동(action)을 취한다. 네트워크 특성에 따라 다양한 기준으로 처리순위를 결정할 수 있다. 본 연구에서는 플로우가 요구하는 마감기한(deadline)을 처리순위 결정의 중요한 기준으로 사용하였다. 딥러닝 기반의 강화학습 모델인DDQN(Double Deep Q-Network)을 활용하여, 고정된 길이의 결정 주기마다 네트워크의 상태(state)를 관측하고행동을 선택함으로써 처리순위를 결정한다. 본 연구의 환경에 맞게 개발한 네트워크 시뮬레이터를 통해 DDQN 에이전트가 여러 휴리스틱 알고리즘과 비교하여 높은 성능을 보이는 것을 확인하였다.

      • KCI등재

        심층강화학습을 이용한 테트리스 로봇

        박관우,김정수 제어·로봇·시스템학회 2022 제어·로봇·시스템학회 논문지 Vol.28 No.12

        In this paper, we develop an artificial intelligence Tetris robot that plays the Tetris game autonomously. The Tetris robot consists of a game agent that learns how to play the Tetris game using reinforcement learning, and hardware that plays the actual game. To develop a game agent using deep reinforcement learning, the Markov decision process was defined and a policy-based deep reinforcement learning was applied. In this paper, the Tetris game agent was trained by applying the PPO (Proximal Policy Optimization) algorithm. In particular, the multi-agent learning method was employed for the PPO learning. For learning, the PPObased game agent took the game screen as an input and applied the action to the game through software to play the Tetris game 500,000 times. In order for the robot to play the actual game, the neural network corresponding to the learned game agent was stored in Jetson Xavier and the motor and camera were used. In other words, the standalone Tetris robot, separate from the computer where the Tetris game is running, consists of a Jetson Xaiver, one camera, one Arduino MEGA, three servo motors, and three fingers. To evaluate the performance of the robot, the value function of the game agent was presented, and the performance of the actual robot was verified through demonstration. .

      • KCI등재

        강한 Data Augmentation과 Contrastive Learning을 통한 이미지 기반 강화학습 일반화 성능 향상 연구

        박상훈,유진우 한국자동차공학회 2023 한국 자동차공학회논문집 Vol.31 No.12

        In this paper, we are proposing a convolutional contrastive learning method that can improve the generalization performance of image-based reinforcement learning. To do this, methods on augmenting input images were mainly used. However, strong augmentation hinders the stability of reinforcement learning. Thus, by gradually increasing the random image mixing ratio during training, a reinforcement learning agent is not affected by strong data augmentation. At the same time, the effect on generalization performance is maximized. Experiments on DM Control test environments have shown that the proposed method outperforms the existing studies on the generalization of image-based reinforcement learning.

      • Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning

        Jeong, Gyeeun,Kim, Ha Young Elsevier 2019 expert systems with applications Vol.117 No.-

        <P><B>Abstract</B></P> <P>We study trading systems using reinforcement learning with three newly proposed methods to maximize total profits and reflect real financial market situations while overcoming the limitations of financial data. First, we propose a trading system that can predict the number of shares to trade. Specifically, we design an automated system that predicts the number of shares by adding a deep neural network (DNN) regressor to a deep Q-network, thereby combining reinforcement learning and a DNN. Second, we study various action strategies that use Q-values to analyze which action strategies are beneficial for profits in a confused market. Finally, we propose transfer learning approaches to prevent overfitting from insufficient financial data. We use four different stock indices—the S&P500, KOSPI, HSI, and EuroStoxx50—to experimentally verify our proposed methods and then conduct extensive research. The proposed automated trading system, which enables us to predict the number of shares with the DNN regressor, increases total profits by four times in S&P500, five times in KOSPI, 12 times in HSI, and six times in EuroStoxx50 compared with the fixed-number trading system. When the market situation is confused, delaying the decision to buy or sell increases total profits by 18% in S&P500, 24% in KOSPI, and 49% in EuroStoxx50. Further, transfer learning increases total profits by twofold in S&P500, 3 times in KOSPI, twofold in HSI, and 2.5 times in EuroStoxx50. The trading system with all three proposed methods increases total profits by 13 times in S&P500, 24 times in KOSPI, 30 times in HSI, and 18 times in EuroStoxx50, outperforming the market and the reinforcement learning model.</P> <P><B>Highlights</B></P> <P> <UL> <LI> A financial trading system is proposed to improve traders’ profits. </LI> <LI> The system uses the number of shares, action strategies, and transfer learning. </LI> <LI> The number of shares is determined by using a DNN regressor. </LI> <LI> When confusion exists, postponing a financial decision is the best policy. </LI> <LI> Transfer learning can address problems of insufficient financial data. </LI> </UL> </P>

      • KCI등재

        A3C 기반의 강화학습을 사용한 DASH 시스템

        최민제,임경식,Choi, Minje,Lim, Kyungshik 대한임베디드공학회 2022 대한임베디드공학회논문지 Vol.17 No.5

        The simple procedural segment selection algorithm commonly used in Dynamic Adaptive Streaming over HTTP (DASH) reveals severe weakness to provide high-quality streaming services in the integrated mobile networks of various wired and wireless links. A major issue could be how to properly cope with dynamically changing underlying network conditions. The key to meet it should be to make the segment selection algorithm much more adaptive to fluctuation of network traffics. This paper presents a system architecture that replaces the existing procedural segment selection algorithm with a deep reinforcement learning algorithm based on the Asynchronous Advantage Actor-Critic (A3C). The distributed A3C-based deep learning server is designed and implemented to allow multiple clients in different network conditions to stream videos simultaneously, collect learning data quickly, and learn asynchronously, resulting in greatly improved learning speed as the number of video clients increases. The performance analysis shows that the proposed algorithm outperforms both the conventional DASH algorithm and the Deep Q-Network algorithm in terms of the user's quality of experience and the speed of deep learning.

      • Machine learning: Overview of the recent progresses and implications for the process systems engineering field

        Lee, Jay H.,Shin, Joohyun,Realff, Matthew J. Elsevier 2018 Computers & chemical engineering Vol.114 No.-

        <P><B>Abstract</B></P> <P>Machine learning (ML) has recently gained in popularity, spurred by well-publicized advances like deep learning and widespread commercial interest in big data analytics. Despite the enthusiasm, some renowned experts of the field have expressed skepticism, which is justifiable given the disappointment with the previous wave of neural networks and other AI techniques. On the other hand, new fundamental advances like the ability to train neural networks with a large number of layers for hierarchical feature learning may present significant new technological and commercial opportunities. This paper critically examines the main advances in deep learning. In addition, connections with another ML branch of reinforcement learning are elucidated and its role in control and decision problems is discussed. Implications of these advances for the fields of process and energy systems engineering are also discussed.</P> <P><B>Highlights</B></P> <P> <UL> <LI> Recent advances in deep learning and reinforcement learning (RL) are reviewed. </LI> <LI> Motivation, early problems and recent resolutions of deep learning are discussed. </LI> <LI> The idea of RL and its success in the Go game (<I>a la</I> AlphaGo) are introduced. </LI> <LI> Applicability of RL to multi-stage decision problems in industries is discussed. </LI> <LI> Potential applications and research directions of ML in the PSE domains are given. </LI> </UL> </P>

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼