Neural Least Square Policy Iteration learning with Critic-only architecture
Intelligent control of real control problems based on reinforcement learning often requires decision-making in a large or continuous state-action space. Since the number of adjustable parameters in discrete reinforcement learning has a direct relationship with cardinality of the state-action space of the problem, so in such problems, we are faced with the curse of dimensiality, low learning speed and low efficiency. The use of continuous reinforcement learning methods to overcome these problems have attracted many research interests. In this paper a novel Neural Reinforcement Learning (NRL) scheme is proposed. The presented method is model free and learning rate independent, and is obtained by combining Least Squares Policy Iteration (LSPI) with Radial Basis Functions (RBF) as a function approximator, and we call it "Neural Least Squares Policy Iteration" (NLSPI). In this method, by using the basis functions defined in the RBF neural network structure, we have provided a solution to solve the challenge of defining the state-action basis functions in LSPI. In order to validate the presented method, the performance of the proposed algorithm in solving two control problems has been compared with other methods. The overall results show the superiority of our method in learning the pseudo-optimal policy.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.