به جمع مشترکان مگیران بپیوندید!

تنها با پرداخت 70 هزارتومان حق اشتراک سالانه به متن مقالات دسترسی داشته باشید و 100 مقاله را بدون هزینه دیگری دریافت کنید.

برای پرداخت حق اشتراک اگر عضو هستید وارد شوید در غیر این صورت حساب کاربری جدید ایجاد کنید

عضویت

جستجوی مقالات مرتبط با کلیدواژه « deep reinforcement learning » در نشریات گروه « برق »

تکرار جستجوی کلیدواژه «deep reinforcement learning» در نشریات گروه «فنی و مهندسی»
  • M. R. Abbasnezhad, A. Jahangard-Rafsanjani *, A. Milani Fard
    Web applications (apps) are integral to our daily lives. Before users can use web apps, testing must be conducted to ensure their reliability. There are various approaches for testing web apps. However, they still require improvement. In fact, they struggle to achieve high coverage of web app functionalities. On the one hand, web apps typically have an extensive state space, which makes testing all states inefficient and time-consuming. On the other hand, specific sequences of actions are required to access certain functionalities. Therefore, the optimal testing strategy extremely depends on the app’s features. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task through trial-and-error rather than explicit supervision, guided by positive or negative reward. Deep RL extends RL, and exploits the learning capabilities of neural networks. These features make Deep RL suitable for testing complex state spaces, such as those found in web apps. However, modern approaches support fundamental RL. We have proposed WeDeep, a Deep RL testing approach for web apps. We evaluated our method using seven open-source web apps. Results from experiments prove it has higher code coverage and fault detection than other existing methods
    Keywords: Deep Reinforcement Learning, Automated Testing, Test Generation, Web Application}
  • Mohammadreza Abbasnezhad, Amir Jahangard Rafsanjani*, Amin Milani Fard

     Web application (app) exploration is a crucial part of various analysis and testing techniques. However, the current methods are not able to properly explore the state space of web apps. As a result, techniques must be developed to guide the exploration in order to get acceptable functionality coverage for web apps. Reinforcement Learning (RL) is a machine learning method in which the best way to do a task is learned through trial and error, with the help of positive or negative rewards, instead of direct supervision. Deep RL is a recent expansion of RL that makes use of neural networks’ learning capabilities. This feature makes Deep RL suitable for exploring the complex state space of web apps. However, current methods provide fundamental RL. In this research, we offer DeepEx, a Deep RL-based exploration strategy for systematically exploring web apps. Empirically evaluated on seven open-source web apps, DeepEx demonstrated a 17% improvement in code coverage and a 16% enhancement in navigational diversity over the stateof-the-art RL-based method. Additionally, it showed a 19% increase in structural diversity. These results confirm the superiority of Deep RL over traditional RL methods in web app exploration.

    Keywords: Deep Reinforcement Learning, Exploration, Model Generation, Web Application}
  • مهدی رعایائی اردکانی*، علی افروغه

    بازی های رایانه ای در سال های اخیر نقش مهمی در پیشرفت هوش مصنوعی داشته اند. بازی ها به عنوان محیطی مناسب برای آزمون و خطا، آزمایش ایده ها و الگوریتم های مختلف هوش مصنوعی مورد استفاده قرار گرفته اند. بازی match-3 یک سبک از بازی های محبوب در تلفن های همراه است که از فضای حالت تصادفی و بسیار بزرگ تشکیل شده که عمل یادگیری در آن را دشوار می کند. در این مقاله یک عامل هوشمند مبتنی بر یادگیری تقویتی عمیق ارائه می شود که هدف آن بیشینه سازی امتیاز در بازی match-3 است. در تعریف عامل پیشنهادی از نگاشت فضای عمل، حالت و همچنین ساختار شبکه عصبی مبتکرانه ای برای محیط بازی match-3 استفاده می شود که توانایی یادگیری حالت های زیاد را داشته باشد. مقایسه روش پیشنهادی با سایر روش های موجود از جمله روش یادگیری تقویتی مبتنی بر سیاست، روش یادگیری تقویتی مبتنی بر ارزش، روش های حریصانه و عامل انسانی نشان از عملکرد برتر روش پیشنهادی در بازی match-3 دارد.

    کلید واژگان: یادگیری تقویتی عمیق, بازی تصادفی, match-3, فضای حالت بزرگ}
    Mehdy Roayaei Ardakany*, Ali Afroughrh

    Computer games have played an important role in the development of artificial intelligence in recent years. Throughout the history of artificial intelligence, computer games have been a suitable test environment for evaluating new approaches and algorithms to artificial intelligence. Different methods, including rule-based methods, tree search methods, and machine learning methods (supervised learning and reinforcement learning) have been developed to create intelligent agents in different games. Games have been used as a suitable environment for trial and error, testing different artificial intelligence ideas and algorithms. Among these researches, we can mention the research of Deep Blue in the game chess and AlphaGo in the game Go. AlphaGo is the first computer program to defeat an expert human Go player. Also, Deep Blue is a chess-playing expert system is the first computer program to win a match, against a world champion. In this paper, we focus on the match-3 game. The match-3 game is a popular game in cell phones, which consists of a very large random state space that makes learning difficult. It also has random reward function which makes learning unstable.  Many researches have been done in the past on different games, including match-3. The aim of these researches has generally been to play optimally or to predict the difficulty of stages designed for human players. Predicting the difficulty of stages helps game developers to improve the quality of their games and provide a better experience for users. Based on the approach used, past works can be divided into three main categories including search-based methods, machine learning methods and heuristic methods. In this paper, an intelligent agent based on deep reinforcement learning is presented, whose goal is to maximize the score in the match-3 game. Reinforcement learning is one of the approaches that has received a lot of attention recently. Reinforcement learning is one of the branches of machine learning in which the agent learns the optimal policy for choosing actions in different spaces through its experiences of interacting with the environment. In deep reinforcement learning, reinforcement learning algorithms are used along with deep neural networks. In the proposed method, different mapping mechanisms for action space and state space are used. Also, a novel structure of neural network for the match-3 game environment has been proposed to achieve the ability to learn large state space. The contributions of this article can be summarized as follow. An approach for mapping the action space to a two-dimensional matrix is presented in which it is possible to easily separate valid and invalid actions. An approach has been designed to map the state space to the input of the deep neural network, which reduces the input space by reducing the depth of the convolutional filter and thus improves the learning process. The reward function has made the learning process stable by separating random rewards from deterministic rewards. The comparison of the proposed method with other existing methods, including PPO, DQN, A3C, greedy method and human agents shows the superior performance of the proposed method in the match-3 game

    Keywords: deep reinforcement learning, random game, match-3, large state space}
  • Yogesh Wankhede, Sheetal Rana, Faruk Kazi

    The hybrid electric train which operates without overhead wires or traditional power sources relies on hydrogen fuel cells and batteries for power. These fuel cell-based hybrid electric trains (FCHETs) are more efficient than those powered by diesel or electricity because they do not produce any tailpipe emissions making them an eco-friendly mode of transport. The target of this paper is to propose low-budget FCHETs that prioritize energy efficiency to reduce operating costs and minimize their impact on the environment. To this end, an energy management strategy [EMS] has been developed that optimizes the distribution of energy to reduce the amount of hydrogen required to power the train. The EMS achieves this by balancing battery charging and discharging. To enhance the performance of the EMS, proposes to use of a deep reinforcement learning (DRL) algorithm specifically the deep deterministic policy gradient (DDPG) combined with transfer learning (TL) which can improve the system's efficiency when driving cycles are changed. </strong>DRL-based strategies are commonly used in energy management and they suffer from unstable convergence, slow learning speed, and insufficient constraint capability. To address these limitations, an action masking technique to stop the DDPG-based approach from producing incorrect actions that go against the system's physical limits and prevent them from being generated is proposed. </strong> The DDPG+TL agent consumes up to 3.9% less energy than conventional rule-based EMS while maintaining the battery's charge level within a predetermined range. The results show that DDPG+TL can sustain battery charge at minimal hydrogen consumption with minimal training time for the agent.

    Keywords: Fuel Cell, State of Charge, Energy Management Strategy, Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Transfer Learning}
  • M. Taghian, A. Asadi, R. Safabakhsh *

    The quality of the extracted features from a long-term sequence of raw prices of the instruments greatly affects the performance of the trading rules learned by machine learning models. Employing a neural encoder-decoder structure to extract informative features from complex input time-series has proved very effective in other popular tasks like neural machine translation and video captioning. In this paper, a novel end-to-end model based on the neural encoder-decoder framework combined with deep reinforcement learning is proposed to learn single instrument trading strategies from a long sequence of raw prices of the instrument. In addition, the effects of different structures for the encoder and various forms of the input sequences on the performance of the learned strategies are investigated. Experimental results showed that the proposed model outperforms other state-of-the-art models in highly dynamic environments.

    Keywords: Deep Reinforcement Learning, Deep Q-Learning, Single Stock Trading, Portfolio Management, Encoder-Decoder Framework}
  • کوروش داداش تبار احمدی*، علی اکبر کیایی، محمدامین عباس زاده

    در این پژوهش به بررسی یک رویکرد مبتنی‏بر یادگیری تقویتی عمیق برای ناوبری خودمختار ربات‏ها ‏می‏‏پردازیم. رویکرد ما در این پژوهش، مبتنی‏بر الگوریتم DDPG و یکی از نسخه‏های بهبود یافته‏ی آن به نام SD3 است. به‏منظور استفاده از این الگوریتم برای مسیله‏ی ناوبری خودمختار، اصلاحاتی بر روی الگوریتم مذکور انجام و برای کاربرد ناوبری بهینه‏سازی شده است. الگوریتم اصلاح شده به علت داشتن لایه‏های کانولوشنی می‏‏تواند با فضاهای حالت با ابعاد زیاد نیز کار کند. همچنین برای کاهش نوسان ربات در حین حرکت و نیز تشویق برای حرکت سریع‏تر در محیط، استفاده از دو پارامتر پاداش و جریمه براساس سرعت خطی و سرعت زاویه‏ای را پیشنهاد دادیم. و برای بهبود تعمیم پذیری الگوریتم، از الگوریتمی برای تغییر متناوب شکل و چینش موانع در محیط استفاده کردیم. همچنین برای تسریع فرایند یادگیری و بهبود عملکرد ربات، داده های ورودی را نرمال کردیم. سپس الگوریتم پیشنهادی را توسط محیط شبیه‏ساز GAZEBO و سیستم عامل ROS پیاده‏سازی کرده و نتایج بدست آمده را با الگوریتم اولیه‏ی SD3 و الگوریتم DDPG مقایسه نمودیم. الگوریتم پیشنهادی عملکرد بهتری نسبت به این دو روش به نمایش گذاشته است.

    کلید واژگان: ناوبری خودمختار, یادگیری تقویتی عمیق, DDPG, SD3}
    Kourosh Dadashtabar Ahmadi*, Ali Akbar Kiaei, Mohammad Amin Abbaszadeh

    In this research we develop a deep reinforcement learning-based method for autonomous robot navigation. Our approach in this study is based on DDPG and one of its improved versions named SD3. We did some modifications on this algorithm to make it proper for autonomous navigation problems and optimize it for this problems. The modified algorithm can work with high dimensional state spaces because of using convolutional layers. Also we propose two reward terms include linear velocity reward and angular velocity penalty to encourage robot to move faster with smoother movements. For generalizing the algorithm we used an algorithm for randomly changing shape, layout and number of obstacles in the environment. And to speed up the learning process and improving the robot operation, we normalized all input data. Finally, the proposed algorithm is implemented with ROS and Gazebo and the results show improvement versus the main SD3 and DDPG algorithms.

    Keywords: Autonomous navigation, Deep reinforcement learning, SD3, DDPG}
  • Mohammadreza Moslehi *, Hossein Ebrahimpor-Komleh, Salman Goli, Reza Taji

    In recent years, exponential growth of communication devices in Internet of Things (IoT) has become an emerging technology which facilitates heterogeneous devices to connect with each other in heterogeneous networks. This communication requires different level of Quality-of-Service (QoS) and policies depending on the device type and location. To provide a specific level of QoS, we can utilize emerging new technological concepts in IoT infrastructure, software-defined network (SDN) and, machine learning algorithms. We use deep reinforcement learning in the process of resource management and allocation in control plane. We present an algorithm that aims to optimize resource allocation. Simulation results show that the proposed algorithm improved network performances in terms of QoS parameters, including delay and throughput compared to Random and Round Robin methods. Compared to similar methods the performance of the proposed method is also as good as the fuzzy and predictive methods.

    Keywords: Internet of Things, Software-Defined Networking (SDN), Deep Reinforcement Learning, QoS}
  • دادمهر رهبری، محسن نیک رای *، پگاه گازری

    هم زمان با فراگیرشدن تکنولوژی اینترنت اشیا در سال های اخیر، تعداد دستگاه های هوشمند و به تبع آن حجم داده های جمع آوری شده توسط آنها به سرعت در حال افزایش است. از سوی دیگر، اغلب برنامه های کاربردی اینترنت اشیا نیازمند تحلیل بلادرنگ داده ها و تاخیر اندک در ارایه خدمات هستند. تحت چنین شرایطی، ارسال داده ها به مراکز داده ابری جهت پردازش، پاسخ گوی نیازمندی های برنامه های کاربردی مذکور نیست و مدل رایانش مه، انتخاب مناسب تری محسوب می گردد. با توجه به آن که منابع پردازشی موجود در مدل رایانش مه دارای محدودیت هستند، استفاده موثر از آنها دارای اهمیت ویژه ای است.در این پژوهش به مسئله زمان بندی وظایف برنامه های کاربردی اینترنت اشیا در محیط رایانش مه پرداخته شده است. هدف اصلی در این مسیله، کاهش تاخیر ارایه خدمات است که جهت دستیابی به آن، از رویکرد یادگیری تقویتی عمیق استفاده شده است. روش ارایه شده در این مقاله، تلفیقی از الگوریتم Q-Learning، یادگیری عمیق و تکنیک های بازپخش تجربه و شبکه هدف است. نتایج شبیه سازی ها نشان می دهد که الگوریتم DQLTS از لحاظ معیار ASD، 76% بهتر از الگوریتم QLTS و 5/6% بهتر از الگوریتم RS عمل می نماید و نسبت به QLTS زمان همگرایی سریع تری دارد.

    کلید واژگان: اینترنت اشیاء, رایانش مه, زمان بندی وظایف, یادگیری تقویتی عمیق}
    Pegah Gazori, Dadmehr Rahbari, Mohsen Nickray *

    With the advent and development of IoT applications in recent years, the number of smart devices and consequently the volume of data collected by them are rapidly increasing. On the other hand, most of the IoT applications require real-time data analysis and low latency in service delivery. Under these circumstances, sending the huge volume of various data to the cloud data centers for processing and analytical purposes is impractical and the fog computing paradigm seems a better choice. Because of limited computational resources in fog nodes, efficient utilization of them is of great importance. In this paper, the scheduling of IoT application tasks in the fog computing paradigm has been considered. The main goal of this study is to reduce the latency of service delivery, in which we have used the deep reinforcement learning approach to meet it. The proposed method of this paper is a combination of the Q-Learning algorithm, deep learning, experience replay, and target network techniques. According to experiment results, The DQLTS algorithm has improved the ASD metric by 76% in comparison to QLTS and 6.5% compared to the RS algorithm. Moreover, it has been reached to faster convergence time than QLTS.

    Keywords: Internet of Things, Fog computing, Task Scheduling, Deep reinforcement learning}
  • سید علی خوشرو، سید حسین خواسته*

    برای سرعت بخشیدن به فرآیند یادگیری در مسایل یادگیری تقویتی با ابعاد بالا، معمولا از ترکیب روش های TD، مانند یادگیری Q یا سارسا، با مکانیزم آثار شایستگی، استفاده می شود. در الگوریتم شبکه عمیق Q (DQN)، که به تازگی معرفی شده، تلاش شده است که با استفاده از شبکه های عصبی عمیق در یادگیری Q، الگوریتم های یادگیری تقویتی را قادر سازد که به درک بالاتری از دنیای بصری رسیده و به مسایلی گسترش یابند که در گذشته رام نشدنی تلقی می شدند. DQN که یک الگوریتم یادگیری تقویتی عمیق خوانده می شود، از سرعت یادگیری پایینی برخوردار است. در این مقاله سعی می شود که از مکانیزم آثار شایستگی که یکی از روش های پایه ای در یادگیری تقویتی به حساب می آید، در یادگیری تقویتی در ترکیب با شبکه های عصبی عمیق استفاده شود تا سرعت فرایند یادگیری بهبود بخشیده شود. همچنین برای مقایسه کارایی با الگوریتم DQN، روی تعدادی از بازی های آتاری 2600، آزمایش انجام شد و نتایج تجربی به دست آمده در آنها نشان می دهند که روش ارایه شده، زمان یادگیری را در مقایسه با الگوریتم DQN، به طرز قابل توجهی کاهش داده و سریعتر به مدل مطلوب همگرا می شود

    کلید واژگان: شبکه های عصبی عمیق, Deep Q Network (DQN), آثار شایستگی, یادگیری تقویتی عمیق}
    Seyed Ali Khoshroo, Seyed Hossein Khasteh*

    To accelerate the learning process in high-dimensional learning problems, the combination of TD techniques, such as Q-learning or SARSA, is usually used with the mechanism of Eligibility Traces. In the newly introduced DQN algorithm, it has been attempted to using deep neural networks in Q learning, to enable reinforcement learning algorithms to reach a greater understanding of the visual world and to address issues Spread in the past that was considered unbreakable. DQN, which is called a deep reinforcement learning algorithm, has a low learning speed. In this paper, we try to use the mechanism of Eligibility Traces, which is one of the basic methods in reinforcement learning, in combination with deep neural networks to improve the learning process speed. Also, for comparing the efficiency with the DQN algorithm, a number of Atari 2600 games were tested and the experimental results obtained showed that the proposed method significantly reduced learning time compared to the DQN algorithm and converges faster to the optimal model.

    Keywords: Deep Neural Networks, Deep Q Networks (DQN), Eligibility Traces, Deep Reinforcement Learning}
نکته
  • نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
  • کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
  • در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.
درخواست پشتیبانی - گزارش اشکال