http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Mizutani Eiji,Dreyfus Stuart 한국통신학회 2023 ICT Express Vol.9 No.6
Reinforcement learning (RL) is fundamental to current artificial intelligence (AI). Since dynamic programming, which is based on Richard Bellman’s principle of optimality, is the basis of RL (and other AI disciplines such as A* search), it is important to apply that principle correctly and artfully. This tutorial uses examples, many from published articles, to examine both the artful and occasionally erroneous application of Bellman’s principle. Our contribution is twofold. First, we emphasize the necessity of a general thought experiment by asking what we call “consultant questions” to determine the proper state space for DP formulation to be established by Bellman’s optimality principle. Second, we convey the role of art based on common sense and effortful cleverness in designing efficient DP. Since Bellman’s principle is intuitive based on common sense, it is sometimes misunderstood. For illustrative purposes, given a directed acyclic graph (DAG), we consider four deterministic minimum-cost path network problems (found in the literature), each bringing up some validity issue concerning Bellman’s principle on a different criterion. We approach them consistently by the conventional wisdom of state augmentation (on top of the physical system state space). As a result, we show that our artful choice of state description not only renders the principle valid in each problem, but also makes each DP as efficient as the standard DP that solves a shortest-path problem in the same DAG, circumventing successfully the so-called curse of dimensionality, a price to be paid frequently by state enlargement. After that, we discuss the potential importance of the treated type of path network problems for practical applications (e.g., communications network).