به جمع مشترکان مگیران بپیوندید!

تنها با پرداخت 70 هزارتومان حق اشتراک سالانه به متن مقالات دسترسی داشته باشید و 100 مقاله را بدون هزینه دیگری دریافت کنید.

برای پرداخت حق اشتراک اگر عضو هستید وارد شوید در غیر این صورت حساب کاربری جدید ایجاد کنید

عضویت

جستجوی مقالات مرتبط با کلیدواژه « Deep reinforcement learning » در نشریات گروه « فنی و مهندسی »

  • M. R. Abbasnezhad, A. Jahangard-Rafsanjani *, A. Milani Fard
    Web applications (apps) are integral to our daily lives. Before users can use web apps, testing must be conducted to ensure their reliability. There are various approaches for testing web apps. However, they still require improvement. In fact, they struggle to achieve high coverage of web app functionalities. On the one hand, web apps typically have an extensive state space, which makes testing all states inefficient and time-consuming. On the other hand, specific sequences of actions are required to access certain functionalities. Therefore, the optimal testing strategy extremely depends on the app’s features. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task through trial-and-error rather than explicit supervision, guided by positive or negative reward. Deep RL extends RL, and exploits the learning capabilities of neural networks. These features make Deep RL suitable for testing complex state spaces, such as those found in web apps. However, modern approaches support fundamental RL. We have proposed WeDeep, a Deep RL testing approach for web apps. We evaluated our method using seven open-source web apps. Results from experiments prove it has higher code coverage and fault detection than other existing methods
    Keywords: Deep Reinforcement Learning, Automated Testing, Test Generation, Web Application}
  • H. Mohammadian Khalafansara, J. Keighobadi *
    Composable life under the extensive global warming of the Earth encourages the progress of renewable energy devices and the adoption of new technologies, such as artificial intelligence. Regarding enormous potential of wave energy and its consistency, wave energy converter (WEC) plays vital role in uniform energy harvesting field. In this paper, the significant environmental changes in the ocean prompt us to propose an intelligent feedback control system to mitigate the impact of disturbances and variable wind effects on the efficacy of WECs. Deep reinforcement learning (DRL), as a powerful machine intelligence technique, is capable of identifying WECs as black-box models. Therefore, based on the DRL model, the disturbance and unmeasured state variables are simultaneously estimated in the extended state observer section. Leakage in identification data and real-time application requirements of limited number of layers in the deep neural networks are compensated by implementation of immersion and invariance-based extended state observer which improves coping with the unwanted exogenous noises as well. In the overall intelligent control system, the estimated parameters are inputted into the DRL as the actor-critic networks. The initial actor network is responsible for predicting the control action, while the subsequent critic network determines the decision criterion for evaluating the accuracy of the actor's estimated amount. Next, the output value of the critic stage is backpropagated through the layers to update the network weights. The simulation test results in MATLAB indicate the convergence of unmeasured parameters/states to the corresponding true values and the significance of newly designed intelligent DRL method.
    Keywords: Wave Energy Converter, Extended state observer, Immersion-, invariance-based control, Deep reinforcement learning, Uniform energy}
  • Mohammadreza Abbasnezhad, Amir Jahangard Rafsanjani*, Amin Milani Fard

     Web application (app) exploration is a crucial part of various analysis and testing techniques. However, the current methods are not able to properly explore the state space of web apps. As a result, techniques must be developed to guide the exploration in order to get acceptable functionality coverage for web apps. Reinforcement Learning (RL) is a machine learning method in which the best way to do a task is learned through trial and error, with the help of positive or negative rewards, instead of direct supervision. Deep RL is a recent expansion of RL that makes use of neural networks’ learning capabilities. This feature makes Deep RL suitable for exploring the complex state space of web apps. However, current methods provide fundamental RL. In this research, we offer DeepEx, a Deep RL-based exploration strategy for systematically exploring web apps. Empirically evaluated on seven open-source web apps, DeepEx demonstrated a 17% improvement in code coverage and a 16% enhancement in navigational diversity over the stateof-the-art RL-based method. Additionally, it showed a 19% increase in structural diversity. These results confirm the superiority of Deep RL over traditional RL methods in web app exploration.

    Keywords: Deep Reinforcement Learning, Exploration, Model Generation, Web Application}
  • مهدی رعایائی اردکانی*، علی افروغه

    بازی های رایانه ای در سال های اخیر نقش مهمی در پیشرفت هوش مصنوعی داشته اند. بازی ها به عنوان محیطی مناسب برای آزمون و خطا، آزمایش ایده ها و الگوریتم های مختلف هوش مصنوعی مورد استفاده قرار گرفته اند. بازی match-3 یک سبک از بازی های محبوب در تلفن های همراه است که از فضای حالت تصادفی و بسیار بزرگ تشکیل شده که عمل یادگیری در آن را دشوار می کند. در این مقاله یک عامل هوشمند مبتنی بر یادگیری تقویتی عمیق ارائه می شود که هدف آن بیشینه سازی امتیاز در بازی match-3 است. در تعریف عامل پیشنهادی از نگاشت فضای عمل، حالت و همچنین ساختار شبکه عصبی مبتکرانه ای برای محیط بازی match-3 استفاده می شود که توانایی یادگیری حالت های زیاد را داشته باشد. مقایسه روش پیشنهادی با سایر روش های موجود از جمله روش یادگیری تقویتی مبتنی بر سیاست، روش یادگیری تقویتی مبتنی بر ارزش، روش های حریصانه و عامل انسانی نشان از عملکرد برتر روش پیشنهادی در بازی match-3 دارد.

    کلید واژگان: یادگیری تقویتی عمیق, بازی تصادفی, match-3, فضای حالت بزرگ}
    Mehdy Roayaei Ardakany*, Ali Afroughrh

    Computer games have played an important role in the development of artificial intelligence in recent years. Throughout the history of artificial intelligence, computer games have been a suitable test environment for evaluating new approaches and algorithms to artificial intelligence. Different methods, including rule-based methods, tree search methods, and machine learning methods (supervised learning and reinforcement learning) have been developed to create intelligent agents in different games. Games have been used as a suitable environment for trial and error, testing different artificial intelligence ideas and algorithms. Among these researches, we can mention the research of Deep Blue in the game chess and AlphaGo in the game Go. AlphaGo is the first computer program to defeat an expert human Go player. Also, Deep Blue is a chess-playing expert system is the first computer program to win a match, against a world champion. In this paper, we focus on the match-3 game. The match-3 game is a popular game in cell phones, which consists of a very large random state space that makes learning difficult. It also has random reward function which makes learning unstable.  Many researches have been done in the past on different games, including match-3. The aim of these researches has generally been to play optimally or to predict the difficulty of stages designed for human players. Predicting the difficulty of stages helps game developers to improve the quality of their games and provide a better experience for users. Based on the approach used, past works can be divided into three main categories including search-based methods, machine learning methods and heuristic methods. In this paper, an intelligent agent based on deep reinforcement learning is presented, whose goal is to maximize the score in the match-3 game. Reinforcement learning is one of the approaches that has received a lot of attention recently. Reinforcement learning is one of the branches of machine learning in which the agent learns the optimal policy for choosing actions in different spaces through its experiences of interacting with the environment. In deep reinforcement learning, reinforcement learning algorithms are used along with deep neural networks. In the proposed method, different mapping mechanisms for action space and state space are used. Also, a novel structure of neural network for the match-3 game environment has been proposed to achieve the ability to learn large state space. The contributions of this article can be summarized as follow. An approach for mapping the action space to a two-dimensional matrix is presented in which it is possible to easily separate valid and invalid actions. An approach has been designed to map the state space to the input of the deep neural network, which reduces the input space by reducing the depth of the convolutional filter and thus improves the learning process. The reward function has made the learning process stable by separating random rewards from deterministic rewards. The comparison of the proposed method with other existing methods, including PPO, DQN, A3C, greedy method and human agents shows the superior performance of the proposed method in the match-3 game

    Keywords: deep reinforcement learning, random game, match-3, large state space}
  • Y. R. Zhao, H. Y. Xu *, Z. Y. Xie
    A closed-loop control framework is developed for the co-flow jet (CFJ) airfoil by combining the numerical flow field environment of a CFJ0012 airfoil with a deep reinforcement learning (DRL) module called tensorforce integrated in Python. The DRL agent, which is trained through interacting with the numerical flow field environment, is capable of acquiring a policy that instructs the mass flow rate of the CFJ to make the stalled airfoil at an angle of attack (AoA) of 18 degrees reach a specific high lift coefficient set to 2.0, thereby effectively suppressing flow separation on the upper surface of the airfoil. The subsequent test shows that the policy can be implemented to find a precise jet momentum coefficient of 0.049 to make the lift coefficient of the CFJ0012 airfoil reach 2.01 with a negligible error of 0.5%. Moreover, to evaluate the generalization ability of the policy trained at an AoA of 18 degrees, two additional tests are conducted at AoAs of 16 and 20 degrees. The results show that, although using the policy gained under another AoA cannot help the lift coefficient of the airfoil reach a set target of 2 accurately, the errors are acceptable with less than 5.5%, which means the policy trained under an AoA of 18 degrees can also be applied to other AoAs to some extent. This work is helpful for the practical application of CFJ technology, as the closed-loop control framework ensures good aerodynamic performance of the CFJ airfoil, even in complex and changeable flight conditions.
    Keywords: Co-flow jet, Closed-loop control, Flow control, Lift enhancement, Deep reinforcement learning}
  • Mehdy Roayaei *

    Contemporary machine learning models, like deep neural networks, require substantial labeled datasets for proper training. However, in areas such as natural language processing, a shortage of labeled data can lead to overfitting. To address this challenge, data augmentation, which involves transforming data points to maintain class labels and provide additional valuable information, has become an effective strategy. In this paper, a deep reinforcement learning-based text augmentation method for sentiment analysis was introduced, combining reinforcement learning with deep learning. The technique uses Deep Q-Network (DQN) as the reinforcement learning method to search for an efficient augmentation strategy, employing four text augmentation transformations: random deletion, synonym replacement, random swapping, and random insertion. Additionally, various deep learning networks, including CNN, Bi-LSTM, Transformer, BERT, and XLNet, were evaluated for the training phase. Experimental findings show that the proposed technique can achieve an accuracy of 65.1% with only 20% of the dataset and 69.3% with 40% of the dataset. Furthermore, with just 10% of the dataset, the method yields an F1-score of 62.1%, rising to 69.1% with 40% of the dataset, outperforming previous approaches. Evaluation on the SemEval dataset demonstrates that reinforcement learning can efficiently augment text datasets for improved sentiment analysis results.

    Keywords: Data Augmentation, Sentiment analysis, Deep reinforcement learning, Neural Network, DQN Algorithm}
  • زهرا زینلی، مهدی سجودی*
    روش های یادگیری تقویتی عمیق نتایج امیدوارکننده ای را در توسعه کنترل کننده های سیگنال ترافیک نشان داده اند. در این مقاله، انعطاف پذیری یک کنترل کننده مبتنی بر یادگیری تقویتی عمیق را در شرایط ترافیک با حجم زیاد و تحت طیف وسیعی از اختلالات محیطی مانند تصادفات، بررسی کرده و یک کنترل کننده قابل اعتماد را در محیط با ترافیک پویا پیشنهاد می دهیم. در این روش ،با استفاده از رویکرد گسسته سازی هر یک از خیابان های چهارراه به سلول هایی تقسیم شده وتاثیر اندازه این سلول ها به لحاظ متفاوت بودن یا یکسان بودن با یکدیگردر کارآیی الگوریتم بررسی می گردد. با انتخاب یک فضای حالت توسعه یافته و متراکم، اطلاعاتی به عامل به عنوان ورودی داده می شودکه بتواند درک کاملی از محیط را در اختیار عامل قرار دهد. برای آموزش عامل از روش یادگیری عمیق Q و بازپخش تجربه استفاده شده و مدل پیشنهادی در شبیه ساز ترافیک SUMO ارزیابی شده است. نتایج شبیه سازی کارایی روش پیشنهادی را در کاهش طول صف حتی در صورت وجود اختلال تایید می کند.
    کلید واژگان: ایمنی ترافیک, تصادف, کنترل ترافیک, یادگیری تقویتی عمیق}
    Zahra Zeinaly, Mahdi Sojoodi *
    Deep reinforcement learning methods have shown promising results in the development of traffic signal controllers. In this paper, we evaluate the flexibility of a controller based on Deep Reinforcement Learning under high traffic volume and under a variety of environmental disruptions, such as accidents, and propose a reliable controller in a dynamic traffic environment. In this method, using the discretization approach, each of the intersection roads is divided into cells and the effect of the size of these cells in terms of whether they are different or identical is studied on the efficiency of the algorithm. By selecting an extended and dense state space, the agent is given information as input that can provide it with a complete understanding of the environment. The Q-deep learning method and experience replay are used to train the agent, and the proposed model is evaluated in the SUMO traffic simulator. The simulation results confirm the efficiency of the proposed method in reducing the queue length even in the presence of a disruption.
    Keywords: traffic safety, Accident, Traffic Control, Deep reinforcement learning}
  • Yogesh Wankhede, Sheetal Rana, Faruk Kazi

    The hybrid electric train which operates without overhead wires or traditional power sources relies on hydrogen fuel cells and batteries for power. These fuel cell-based hybrid electric trains (FCHETs) are more efficient than those powered by diesel or electricity because they do not produce any tailpipe emissions making them an eco-friendly mode of transport. The target of this paper is to propose low-budget FCHETs that prioritize energy efficiency to reduce operating costs and minimize their impact on the environment. To this end, an energy management strategy [EMS] has been developed that optimizes the distribution of energy to reduce the amount of hydrogen required to power the train. The EMS achieves this by balancing battery charging and discharging. To enhance the performance of the EMS, proposes to use of a deep reinforcement learning (DRL) algorithm specifically the deep deterministic policy gradient (DDPG) combined with transfer learning (TL) which can improve the system's efficiency when driving cycles are changed. </strong>DRL-based strategies are commonly used in energy management and they suffer from unstable convergence, slow learning speed, and insufficient constraint capability. To address these limitations, an action masking technique to stop the DDPG-based approach from producing incorrect actions that go against the system's physical limits and prevent them from being generated is proposed. </strong> The DDPG+TL agent consumes up to 3.9% less energy than conventional rule-based EMS while maintaining the battery's charge level within a predetermined range. The results show that DDPG+TL can sustain battery charge at minimal hydrogen consumption with minimal training time for the agent.

    Keywords: Fuel Cell, State of Charge, Energy Management Strategy, Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Transfer Learning}
  • M. Taghian, A. Asadi, R. Safabakhsh *

    The quality of the extracted features from a long-term sequence of raw prices of the instruments greatly affects the performance of the trading rules learned by machine learning models. Employing a neural encoder-decoder structure to extract informative features from complex input time-series has proved very effective in other popular tasks like neural machine translation and video captioning. In this paper, a novel end-to-end model based on the neural encoder-decoder framework combined with deep reinforcement learning is proposed to learn single instrument trading strategies from a long sequence of raw prices of the instrument. In addition, the effects of different structures for the encoder and various forms of the input sequences on the performance of the learned strategies are investigated. Experimental results showed that the proposed model outperforms other state-of-the-art models in highly dynamic environments.

    Keywords: Deep Reinforcement Learning, Deep Q-Learning, Single Stock Trading, Portfolio Management, Encoder-Decoder Framework}
  • Zahra Dehghani Ghobadi *, Firoozeh Haghighi, Abdollah Safari

    Condition-based maintenance (CBM) involves making decisions on maintenance based on the actual deterioration conditions of the components. It consists of a chain of states representing various stages of deterioration and a set of maintenance actions. Therefore, condition-based maintenance is a sequential decision-making problem. Reinforcement Learning(RL) is a subfield of Machine Learning proposed for automated decision-making. This article provides an overview of reinforcement learning and deep reinforcement learning methods that have been used so far in condition-based maintenance optimization.

    Keywords: Reinforcement Learning, Deep reinforcement learning, Condition-based Maintenance, Markov decision process}
  • کوروش داداش تبار احمدی*، علی اکبر کیایی، محمدامین عباس زاده

    در این پژوهش به بررسی یک رویکرد مبتنی‏بر یادگیری تقویتی عمیق برای ناوبری خودمختار ربات‏ها ‏می‏‏پردازیم. رویکرد ما در این پژوهش، مبتنی‏بر الگوریتم DDPG و یکی از نسخه‏های بهبود یافته‏ی آن به نام SD3 است. به‏منظور استفاده از این الگوریتم برای مسیله‏ی ناوبری خودمختار، اصلاحاتی بر روی الگوریتم مذکور انجام و برای کاربرد ناوبری بهینه‏سازی شده است. الگوریتم اصلاح شده به علت داشتن لایه‏های کانولوشنی می‏‏تواند با فضاهای حالت با ابعاد زیاد نیز کار کند. همچنین برای کاهش نوسان ربات در حین حرکت و نیز تشویق برای حرکت سریع‏تر در محیط، استفاده از دو پارامتر پاداش و جریمه براساس سرعت خطی و سرعت زاویه‏ای را پیشنهاد دادیم. و برای بهبود تعمیم پذیری الگوریتم، از الگوریتمی برای تغییر متناوب شکل و چینش موانع در محیط استفاده کردیم. همچنین برای تسریع فرایند یادگیری و بهبود عملکرد ربات، داده های ورودی را نرمال کردیم. سپس الگوریتم پیشنهادی را توسط محیط شبیه‏ساز GAZEBO و سیستم عامل ROS پیاده‏سازی کرده و نتایج بدست آمده را با الگوریتم اولیه‏ی SD3 و الگوریتم DDPG مقایسه نمودیم. الگوریتم پیشنهادی عملکرد بهتری نسبت به این دو روش به نمایش گذاشته است.

    کلید واژگان: ناوبری خودمختار, یادگیری تقویتی عمیق, DDPG, SD3}
    Kourosh Dadashtabar Ahmadi*, Ali Akbar Kiaei, Mohammad Amin Abbaszadeh

    In this research we develop a deep reinforcement learning-based method for autonomous robot navigation. Our approach in this study is based on DDPG and one of its improved versions named SD3. We did some modifications on this algorithm to make it proper for autonomous navigation problems and optimize it for this problems. The modified algorithm can work with high dimensional state spaces because of using convolutional layers. Also we propose two reward terms include linear velocity reward and angular velocity penalty to encourage robot to move faster with smoother movements. For generalizing the algorithm we used an algorithm for randomly changing shape, layout and number of obstacles in the environment. And to speed up the learning process and improving the robot operation, we normalized all input data. Finally, the proposed algorithm is implemented with ROS and Gazebo and the results show improvement versus the main SD3 and DDPG algorithms.

    Keywords: Autonomous navigation, Deep reinforcement learning, SD3, DDPG}
  • Mohammadreza Moslehi *, Hossein Ebrahimpor-Komleh, Salman Goli, Reza Taji

    In recent years, exponential growth of communication devices in Internet of Things (IoT) has become an emerging technology which facilitates heterogeneous devices to connect with each other in heterogeneous networks. This communication requires different level of Quality-of-Service (QoS) and policies depending on the device type and location. To provide a specific level of QoS, we can utilize emerging new technological concepts in IoT infrastructure, software-defined network (SDN) and, machine learning algorithms. We use deep reinforcement learning in the process of resource management and allocation in control plane. We present an algorithm that aims to optimize resource allocation. Simulation results show that the proposed algorithm improved network performances in terms of QoS parameters, including delay and throughput compared to Random and Round Robin methods. Compared to similar methods the performance of the proposed method is also as good as the fuzzy and predictive methods.

    Keywords: Internet of Things, Software-Defined Networking (SDN), Deep Reinforcement Learning, QoS}
  • دادمهر رهبری، محسن نیک رای *، پگاه گازری

    هم زمان با فراگیرشدن تکنولوژی اینترنت اشیا در سال های اخیر، تعداد دستگاه های هوشمند و به تبع آن حجم داده های جمع آوری شده توسط آنها به سرعت در حال افزایش است. از سوی دیگر، اغلب برنامه های کاربردی اینترنت اشیا نیازمند تحلیل بلادرنگ داده ها و تاخیر اندک در ارایه خدمات هستند. تحت چنین شرایطی، ارسال داده ها به مراکز داده ابری جهت پردازش، پاسخ گوی نیازمندی های برنامه های کاربردی مذکور نیست و مدل رایانش مه، انتخاب مناسب تری محسوب می گردد. با توجه به آن که منابع پردازشی موجود در مدل رایانش مه دارای محدودیت هستند، استفاده موثر از آنها دارای اهمیت ویژه ای است.در این پژوهش به مسئله زمان بندی وظایف برنامه های کاربردی اینترنت اشیا در محیط رایانش مه پرداخته شده است. هدف اصلی در این مسیله، کاهش تاخیر ارایه خدمات است که جهت دستیابی به آن، از رویکرد یادگیری تقویتی عمیق استفاده شده است. روش ارایه شده در این مقاله، تلفیقی از الگوریتم Q-Learning، یادگیری عمیق و تکنیک های بازپخش تجربه و شبکه هدف است. نتایج شبیه سازی ها نشان می دهد که الگوریتم DQLTS از لحاظ معیار ASD، 76% بهتر از الگوریتم QLTS و 5/6% بهتر از الگوریتم RS عمل می نماید و نسبت به QLTS زمان همگرایی سریع تری دارد.

    کلید واژگان: اینترنت اشیاء, رایانش مه, زمان بندی وظایف, یادگیری تقویتی عمیق}
    Pegah Gazori, Dadmehr Rahbari, Mohsen Nickray *

    With the advent and development of IoT applications in recent years, the number of smart devices and consequently the volume of data collected by them are rapidly increasing. On the other hand, most of the IoT applications require real-time data analysis and low latency in service delivery. Under these circumstances, sending the huge volume of various data to the cloud data centers for processing and analytical purposes is impractical and the fog computing paradigm seems a better choice. Because of limited computational resources in fog nodes, efficient utilization of them is of great importance. In this paper, the scheduling of IoT application tasks in the fog computing paradigm has been considered. The main goal of this study is to reduce the latency of service delivery, in which we have used the deep reinforcement learning approach to meet it. The proposed method of this paper is a combination of the Q-Learning algorithm, deep learning, experience replay, and target network techniques. According to experiment results, The DQLTS algorithm has improved the ASD metric by 76% in comparison to QLTS and 6.5% compared to the RS algorithm. Moreover, it has been reached to faster convergence time than QLTS.

    Keywords: Internet of Things, Fog computing, Task Scheduling, Deep reinforcement learning}
  • سید علی خوشرو، سید حسین خواسته*

    برای سرعت بخشیدن به فرآیند یادگیری در مسایل یادگیری تقویتی با ابعاد بالا، معمولا از ترکیب روش های TD، مانند یادگیری Q یا سارسا، با مکانیزم آثار شایستگی، استفاده می شود. در الگوریتم شبکه عمیق Q (DQN)، که به تازگی معرفی شده، تلاش شده است که با استفاده از شبکه های عصبی عمیق در یادگیری Q، الگوریتم های یادگیری تقویتی را قادر سازد که به درک بالاتری از دنیای بصری رسیده و به مسایلی گسترش یابند که در گذشته رام نشدنی تلقی می شدند. DQN که یک الگوریتم یادگیری تقویتی عمیق خوانده می شود، از سرعت یادگیری پایینی برخوردار است. در این مقاله سعی می شود که از مکانیزم آثار شایستگی که یکی از روش های پایه ای در یادگیری تقویتی به حساب می آید، در یادگیری تقویتی در ترکیب با شبکه های عصبی عمیق استفاده شود تا سرعت فرایند یادگیری بهبود بخشیده شود. همچنین برای مقایسه کارایی با الگوریتم DQN، روی تعدادی از بازی های آتاری 2600، آزمایش انجام شد و نتایج تجربی به دست آمده در آنها نشان می دهند که روش ارایه شده، زمان یادگیری را در مقایسه با الگوریتم DQN، به طرز قابل توجهی کاهش داده و سریعتر به مدل مطلوب همگرا می شود

    کلید واژگان: شبکه های عصبی عمیق, Deep Q Network (DQN), آثار شایستگی, یادگیری تقویتی عمیق}
    Seyed Ali Khoshroo, Seyed Hossein Khasteh*

    To accelerate the learning process in high-dimensional learning problems, the combination of TD techniques, such as Q-learning or SARSA, is usually used with the mechanism of Eligibility Traces. In the newly introduced DQN algorithm, it has been attempted to using deep neural networks in Q learning, to enable reinforcement learning algorithms to reach a greater understanding of the visual world and to address issues Spread in the past that was considered unbreakable. DQN, which is called a deep reinforcement learning algorithm, has a low learning speed. In this paper, we try to use the mechanism of Eligibility Traces, which is one of the basic methods in reinforcement learning, in combination with deep neural networks to improve the learning process speed. Also, for comparing the efficiency with the DQN algorithm, a number of Atari 2600 games were tested and the experimental results obtained showed that the proposed method significantly reduced learning time compared to the DQN algorithm and converges faster to the optimal model.

    Keywords: Deep Neural Networks, Deep Q Networks (DQN), Eligibility Traces, Deep Reinforcement Learning}
نکته
  • نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
  • کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
  • در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.
درخواست پشتیبانی - گزارش اشکال