ForSts: Tacit Collusion in the Repeated Non-Cooperative Games Using Forwarding N-Steps Reinforcement Learning Algorithm
In the game theory, the well-known solution to obtain the best profit in non-repeated games as much as possible is the Nash equilibrium. However, in some repeated non-cooperative games, agents can achieve more profit than the Nash equilibrium by tacit collusion. One of the methods to achieve profit more than Nash equilibriums in tacit collusion is reinforcement learning. However, reinforcement learning-based methods consider only one step in the learning process. To achieve and improve profit in these games, more than one step can be used. In this regard, a learning-based forwarding N-steps algorithm called Forwarding Steps (ForSts) is proposed in this paper. The main idea behind ForSts is to improve the performance of agents in non-cooperative games by observing the last N-step rewards. As ForSts is used in the game theory to learn tacit collusion, it is evaluated by the iterated prisoner’s dilemma and the Cournot market. Prisoner’s Dilemma is an example of a traditional game. The results show that in the iterated prisoner’s dilemma, the agents using ForSts achieve better profit than the agents playing in the Nash equilibrium. Also, in the Cournot electricity market, sum of the profit of agents using ForSts is 3.614% more than the sum of profit of agents` playing in the Nash equilibrium.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.