فهرست مطالب

فصلنامه پردازش علائم و داده ها
سال شانزدهم شماره 4 (پیاپی 42، زمستان 1398)

  • تاریخ انتشار: 1398/12/11
  • تعداد عناوین: 10
|
  • محمدرضا گندمی*، حمید حسن پور صفحات 3-16

    شناسایی ترافیک شبکه یکی از نیازهای اساسی مدیران جهت کنترل شبکه، برای بهبود کیفیت خدمات دهی و حفظ امنیت در شبکه است. یکی از چالش های اساسی در روش های مبتنی بر تحلیل آماری بسته ها، شناسایی ترافیک شبکه، مساله از دست دادن (اتلاف) بسته ها است که استفاده از ویژگی های آماری در تحلیل ترافیک شبکه را با مشکل جدی روبه رو می سازد. این مساله، ویژگی های آماری بسته ها نظیر فاصله زمانی بین ارسال بسته های متوالی برنامه های کاربردی را تحت تاثیر قرار می دهد، و در مواردی دقت شناسایی ترافیک را به میزان قابل توجهی کاهش می دهد. هدف اصلی این مقاله بررسی تاثیرات اتلاف بسته ها بر روی ویژگی های آماری، و در نتیجه دقت شناسایی برنامه های کاربردی، و همچنین استخراج ویژگی های مناسب جهت چیره شدن بر این تاثیرات است. بدین منظور، رفتار چهار ویژگی آماری، مورد بررسی قرار گرفته و با استخراج ویژگی از توزیع آنها ترافیک شبکه شناسایی می شود. به همین منظور پایگاه داده ای از ترافیک هفت برنامه کاربردی با نرخ های مختلفی از اتلاف بسته، تهیه شده و میزان صحت تشخیص برنامه های کاربردی به وسیله شبکه عصبی، مورد تحلیل قرار گرفته است. نتایج نشان می دهد که ویژگی های استخراج شده در مقابل رخداد اتلاف بسته ها مقاوم بوده و دقت شناسایی ترافیک شبکه را در حالت های مختلف رخداد اتلاف بسته به حالت ایده آل (عدم رخداد اتلاف بسته در شبکه) نزدیک می کند.

    کلیدواژگان: ترافیک شبکه، شناسایی ترافیک شبکه، یادگیری ماشین، از دست دادن بسته
  • هادی سلیمانی*، علیرضا مهرداد، سعیده صادقی، فرخ لقا معظمی صفحات 17-26

    تحلیل تفاضل ناممکن ابزاری قوی به منظور ارزیابی امنیتی رمزهای قالبی است که بر پایه یافتن یک مشخصه تفاضلی با احتمال به طوردقیق صفر بنا شده است. سرعت انتشار لایه خطی یک رمز قالبی، نقشی اساسی در امنیت الگوریتم رمز در مقابل تحلیل تفاضل ناممکن دارد و با تغییر لایه خطی، امنیت الگوریتم در مقابل تحلیل تفاضل ناممکن به شدت تغییر می کند. در این مقاله، روشی کارا و متفاوت برای یافتن مشخصه های تفاضلی رمز قالبی سبک وزن Zorro ارایه می کنیم که مستقل از ویژگی های لایه خطی الگوریتم است. به بیان دیگر در این مقاله نشان خواهیم داد که مستقل از ویژگی های عناصر الگوریتم، می توان برای نه دور از الگوریتم Zorro مشخصه تفاضل ناممکن کارایی به دست آورد. همچنین برپایه این مشخصه نه دوری، یک حمله بازیابی کلید برای ده دور الگوریتم Zorro ارایه می کنیم.

    کلیدواژگان: رمز قالبی، تحلیل رمز، تحلیل تفاضل ناممکن، الگوریتم رمز قالبی Zorro
  • سید مسعود اجابتی*، سید حمید ظهیری صفحات 27-44

    در دنیای واقعی بسیاری از مسایل بهینه سازی، پویا، غیرقطعی و پیچیده هستند که در آن تابع هدف یا محدودیت ها می توانند در طول زمان تغییر یابند و در نتیجه، بهینه این مسایل نیز می تواند تغییر کند؛ از این رو الگوریتم های بهینه سازی نه تنها باید مقدار بهینه سراسری را در فضای جستجو پیدا، بلکه باید مسیر تغییرات بهینه را در محیط پویا دنبال کنند. در این مقاله برای دست یابی به این توانایی الگوریتم جدیدی بر مبنای الگوریتم بهینه سازی ذرات به نام الگوریتم بهینه سازی ذرات افزایشی کاهشی، پیشنهاد شده است. این الگوریتم همواره در روند بهینه سازی به طور انطباقی با کاهش یا افزایش تعداد ذرات الگوریتم، توانایی یافتن و دنبال کردن تعداد بهینه متغیر با زمان را در محیط هایی که تغییرات آن قابل آشکارسازی نیست، دارد؛ علاوه بر این تعریف جدیدی به نام ناحیه جستجو متمرکز با هدف برجسته کردن فضاهای امیدبخش برای سرعت بخشیدن به فرآیند جستجوی محلی و جلوگیری از همگرایی زودرس تعریف شده است. نتایج حاصل از الگوریتم پیشنهادی بر روی معیار قله های متحرک ارزیابی و با نتایج چندین الگوریتم معتبر مقایسه شده است. نتایج نشان دهنده تاثیر مثبت سازوکار کاهش/افزایش ذرات بر زمان یافتن و دنبال کردن چندین بهینه در مقایسه با سایر الگوریتم های بهینه سازی مبتنی بر چند جمعیتی است.

    کلیدواژگان: افزایش و کاهش ذرات، مسائل بهینه سازی پویا (DOPs)، روش چندجمعیتی، الگوریتم بهینه سازی ذرات
  • علیرضا اسحاقپور، مصطفی صالحی*، وحید رنجبر صفحات 45-58

    در سال های اخیرشبکه های اجتماعی مجازی روز به روز در حال رشد و تغییر هستند. یال های جدید نشان دهنده تعاملات میان گره ها هستند و پیش بینی آن ها از اهمیت بالایی برخوردار است. معیارهای پیش بینی یال را می توان به دو گروه مبتنی بر همسایگی گره و مبتنی بر پیمایش مسیر تقسیم کرد. پژوهش گران ایجاد یال جدید در شبکه را از منظر نظری به دو علت نزدیکی در گراف و هوموفیلی نسبت می دهند. با وجود مطالعات بسیار در حوزه علوم شبکه مطالعه تاثیر دو رویکرد نظری در کنار یکدیگر در ایجاد یال ها مسئله ای باز محسوب می شود و تاکنون معیارهای شباهت مبتنی بر همسایگی گره از این منظر مطالعه نشده اند. در این پژوهش مدلی ارایه کردیم تا با استفاده از آن از مزایای هر دو رویکرد نزدیکی در گراف و هوموفیلی استفاده کنیم و با استفاده از آن توانستیم بر دقت معیارهای شباهت مبتنی بر همسایگی گره بیفزاییم. برای ارزیابی این پژوهش از دو مجموعه داده شبکه اجتماعی مجازی زنجان و شبکه اجتماعی مجازی پوکک استفاده شده است که مجموعه داده نخست برای این پژوهش جمع آوری و سپس تکمیل شده است.

    کلیدواژگان: پیش بینی یال، شباهت هوموفیلی، شباهت ساختاری، شبکه های اجتماعی
  • مهدی بنی طالبی دهکردی، عباس ابراهیمی مقدم*، مرتضی خادمی، هادی هادی زاده صفحات 59-72

    امروزه پژوهش گران، از مزایای بسیار زیاد استفاده از مدل سازی توجه بصری انسان، در زمینه های مختلف، به صورت گسترده استفاده می کنند. در روش های مختلف ارایه شده در این راستا، نقشه هایی دو بعدی موسوم به "نقشه نقاط برجسته" استخراج می شود که مقادیر نقاط مختلف در آن، بیان گر میزان جلب توجه بیننده به نقاط متناظر در تصویر است. در این مقاله نیز برای به دست آوردن نقشه برجستگی از ضرایب موجک تصاویر، براساس تکنیک نمونه برداری فشرده، نمونه های تصادفی انتخاب می شوند. در ادامه، از نمونه های انتخاب شده نقشه های ویژگی تولید می شود. با استفاده از نقشه های ویژگی به دست آمده، نقشه برجستگی محلی و نقشه برجستگی کلی محاسبه می شود. در نهایت، با ترکیب خطی نقشه برجستگی محلی و کلی به دست آمده، نقشه برجستگی نهایی محاسبه می شود. ارزیابی های تجربی حاکی از نتایج امیدوارکننده ای از برتری روش ارایه شده نسبت به سایر مدل های تشخیص برجستگی، در آشکارسازی نواحی برجسته و در عین حال در کاهش حجم محاسباتی است.

    کلیدواژگان: نقشه نقاط برجسته، توجه بصری، تبدیل موجک، تنکی، نمونه برداری فشرده
  • رضا مظفری، سمیرا مودتی* صفحات 73-92

    در این مقاله یک روش جدید به منظور حذف نوفه تصویر براساس یادگیری واژه نامه ناهمدوس در فضای تطبیق یافته ارایه می شود. روال یادگیری واژه نامه براساس در نظر گرفتن معیار همدوسی به منظور حصول واژه نامه های فراکامل با اتم های ناهمدوس و به کارگیری روش تطبیق فضا به منظور کاهش زمان پردازش و دست یابی به تصویر حذف نوفه شده با دقت بیشتر است. با استفاده از این روش، واژه نامه اولیه ای از داده تصویر در دسترس تهیه و سپس اتم های آموزش دیده متناسب با نوفه ای که محیط آزمایش با آن درگیر است به کمک یک الگوریتم بهینه سازی جدید مبتنی بر روش حافظه محدود BFGS به روز می شوند. همچنین گام بازنمایی تنک در این الگوریتم بر مبنای یک الگوریتم مبتنی بر افزایش همدوسی اتم-داده است. آموزش واژه نامه فراکامل با اتم های ناهمدوس بسیار حایز اهمیت است؛ زیرا به خطای تقریب کوچک تر در بازنمایی تنک منتهی می شود چون در بازنمایی داده تصویر، اتم های مستقل از هم نقش بیشتری خواهند داشت و فضای داده را به بهترین نحو پوشش می دهند. همچنین از یک روش بازنمایی تنک ناهمدوس نیز در روال یادگیری واژه نامه بهره گرفته می شود. به کارگیری این روال یادگیری موجب دست یابی به تصویر حذف نوفه شده با دقت بالا می شود. نتایج شبیه سازی با نتایج الگوریتم حذف نویز تصویر مبتنی بر روال تطبیق فضای پایه و روش یادگیری واژه نامه مبتنی بر K-SVD مقایسه شده است. نتایج شبیه سازی های انجام شده نشان می دهد که الگوریتم پیشنهادی در حذف نوفه گوسین به نتایج مناسب تری نسبت به سایر الگوریتم ها دست یافته و توانسته است با به کارگیری اتم های ناهمدوس، ساختار داده ورودی را به گونه مناسبی بازنمایی کند.

    کلیدواژگان: حذف نوفه، پردازش تصویر، یادگیری واژه نامه، همدوسی، تطبیق فضا
  • سعیده ممتازی*، فرزانه ترابی صفحات 93-112

    شناسایی موجودیت های نامدار [1] یکی از فعالیت های زیربنایی در حوزه پردازش زبان طبیعی [2] و به طور کلی زیر مجموعه ای از استخراج اطلاعات [3] است. در فرآیند شناسایی موجودیت های نامدار به دنبال یافتن عناصر اسمی در متن و دسته بندی آنها به رده هایی ازپیش تعیین شده از قبیل اسامی اشخاص، سازمان ها، مکان ها، مذاهب، عنوان کتاب ها، عنوان فیلم ها و غیره هستیم. در این مقاله با بهره گیری از روش های نوین در این حوزه مانند استفاده از دو بردار مختلف بازنمایی معنایی واژگان برمبنای کلمه و حروف تشکیل دهنده آن برمبنای شبکه های عصبیو همچنین استفاده از روش های یادگیری عمیق [4] یک سامانه تشخیص موجودیت های نامدار معرفی می شود. همچنین در راستای پژوهش حاضر، یک پیکره برچسب گذاری شده شامل سه هزار چکیده از ویکی پدیای فارسی که شامل نود هزار واژه است با استفاده از پانزده برچسب مختلف ارایه می شود که گام مهمی در ارتقای پژوهش های آینده این حوزه برداشته خواهد شد. نتایج حاصل از ارزیابی سامانه پیشنهادی نشان می دهد که می توان با استفاده از داده معرفی شده به دقت 09/72 در معیار F رسید.



    کلیدواژگان: تشخیص موجودیت های نامدار، پردازش زبان طبیعی، بازنمایی معنایی کلمات، یادگیری عمیق
  • امیر سلطانی محبوب*، سید حمید ظهیری ممقانی صفحات 113-134

    سامانه های ANFIS به دلیل عملکرد قابل قبولی که در زمینه ایجاد و آموزش طبقه بند فازی داده دارند، بسیار موردتوجه واقع شده اند. یک چالش اصلی در طراحی یک سامانه ANFIS رسیدن به یک روش کارآمد، با دقت بالا و قابلیت تفسیر مناسب است. بدون تردید نوع و مکان توابع عضویت و همچنین نحوه آموزش یک شبکه ANFIS تاثیر به سزایی در عملکرد آن دارد. تاکنون پژوهش های مرتبط تنها به یافتن نوع و مکان توابع عضویت و یا پیشنهاد روشی برای آموزش این شبکه ها بسنده کرده اند. علت اصلی عدم به کارگیری هم زمان تعیین نوع و مکان توابع عضویت و آموزش یک شبکه ANFIS در ثابت بودن طول نسخه های استاندارد روش های ابتکاری است. در این مقاله، ابتدا نسخه جدیدی از روش بهینه سازی صفحات شیب دار با قابلیت متغیر بودن عوامل جستجو در آن، معرفی می شود؛ سپس قابلیت به وجود آمده، برای تعیین نوع و مکان توابع عضویت و آموزش هم زمان یک طبقه بند مبتنی بر سامانه استنتاج عصبی-فازی تطبیقی به کار بسته می شود. نتایج  بر روی  چند پایگاه داده مشهور با تعداد رده های مرجع متفاوت و طول بردارهای ویژگی مختلف مورد آزمایش قرار گرفته و با نتایج روش پیشنهادی به صورت مقایسه ای گزارش شده است، این آزمایشات نشان دهنده عملکرد بهتر روش پیشنهادی است.

    کلیدواژگان: بازشناسی الگو، طبقه بندی، سامانه استنتاج عصبی - فازی تطبیقی، بهینه ساز صفحات شیب دار با طول متغیر
  • وحید صادقی* صفحات 135-150

    یکی از فعالیت های شناختی پیچیده در چارچوب نظام آوایی زبان این است که اهل زبان قادرند گفتار پیوسته را به صورت زنجیره واژگان گسسته درک کنند. یافته های پیشین مطالعات آزمایشگاهی بر روی زبان فارسی و دیگر زبان ها نشان داده است، در زبان هایی که در آنها تکیه به طور ثابت (یا با فراوانی وقوع زیاد) در مرز آغازی یا پایانی واژه قرار می گیرد، شنونده ها از نشانه های آکوستیکی تکیه برای تقطیع گفتار پیوسته به واژگان سازنده آن استفاده می کنند. همچنین، این گونه فرض شده است که حضور تکیه در جایگاهی غیر از مرز آغازی یا پایانی واژه مانع از کارکرد مرزنمایی این عامل نوایی می شود. در زبان فارسی حضور واژه بست در واژه باعث می شود که تکیه در جایگاهی غیر از پایان واژه واقع شود. پژوهش حاضر با هدف پاسخ گویی به یک سوال اساسی درباره نحوه پردازش درکی گفتار پیوسته فارسی انجام شد: آیا مرز پایانی واژگان (اعم از واژگان حاوی واژه بست و واژگان فاقد واژه بست) با توجه به ساخت نواختی واژگان در دستور واجی آهنگ فارسی برای شنونده قابل شناسایی است؟ برای این منظور دوآزمایش شنیداری انجام شد. نتایج این آزمایش ها نشان داد که شنونده هر نقطه پایانی H (در یک گستره نواختی H) در منحنی آهنگ گفتار فارسی را به صورت مرز پایانی یک واژه شناسایی می کند. همچنین نتایج به دست آمده نشان داد که درک شنیداری الگوی برجستگی نوایی وابسته به محل وقوع قله H تکیه زیروبمی است.

    کلیدواژگان: مرز واژه، آهنگ گفتار، برجستگی نوایی، گستره نواختی H، محل وقوع قله
  • سعید روحانی*، طاهره پزشکی، بابک سهرابی صفحات 151-164

    یکی از مباحث پژوهشی مهم امروز در حوزه فناوری اطلاعات و فناوری استفاده از دانش نهفته در داده هایی است که امروزه با سرعت بالا، حجم زیاد و با تنوع فراوان در فرمت داده تولید می شوند. داده هایی با چنین ویژگی هایی را کلان داده می نامند. استخراج، پردازش و بصری سازی نتایج حاصل از کلان داده امروزه به یکی از دغدغه های دانشمندان علم داده تبدیل شده است. گفتنی است که امروزه زیر ساخت‍ ها، روش ها و ابزارهای بسیاری برای تحلیل کلان داده توسعه یافته اند. هدف این مقاله ارایه راهکاری برای استخراج و بصری سازی داده های شبکه اجتماعی توییتر به صورت بلادرنگ با حذف پایگاه های داده به عنوان نمونه ای از تحلیل کلان داده است. در این پژوهش یکی از راه حل های بصری سازی بلادرنگ، با استفاده از داده های توییتر به عنوان جریان ورودی، از آپاچی استورم به عنوان پلتفرم پردازشی و از D3.jsبرای نمایش داده ها ارایه خواهد شد؛ در نهایت داشبورد طراحی شده با استفاده از روش طراحی آزمایش ها و آزمون های آماری از نظر زمان طی شده برای پاسخ (Latency). در انواع پیکره بندی های مختلف آپاچی استورم مورد ارزیابی قرار گرفته و در نهایت بلادرنگ بودن با میانگین زمان پاسخ برابر یک دقیقه و سی ثانیه تایید شد.

    کلیدواژگان: کلان داده، بصری سازی، داشبورد بلادرنگ
|
  • Mohammadreza Gandomi*, Hamid Hassanpour Pages 3-16

    There are huge petitions of network traffic coming from various applications on Internet. In dealing with this volume of network traffic, network management plays a crucial rule. Traffic classification is a basic technique which is used by Internet service providers (ISP) to manage network resources and to guarantee Internet security. In addition, growing bandwidth usage, at one hand, and limited physical capacity of communication lines, at the other hand, lead providers to improve utilization quality of network resources. In fact, classification or identification of network is a critical task in network processing for traffic management, anomaly detection, and also to improve network quality-of-service (QoS). Port and payload based methods are two classical techniques which are applicable under traditional network conditions. However, many Internet applications use dynamic port numbers for communications, which lead to difficulties in identifying traffic using port numbers. Also many applications encrypt the data before transmitting to avoid detection. Therefore, payload-based techniques are inefficient for these traffics. In recent years, statistical feature-based traffic flow identification methods (STFIM) have attracted the interest of many researchers. The most important part of a STFIM is the selection of efficient statistical features. Preliminary analysis shows that the problem of packet loss in data transmission is one of the major challenges in employing STFIM for network traffic identification. This affects the statistical characteristics of packets, such as the time interval between sending successive application packets, and in some cases significantly reduces the accuracy of traffic identification. The main goal of this paper is to examine the effects of packet loss on statistical features, and therefore the accuracy of identifying applications, as well as extracting appropriate features to overcome these effects. For this purpose, the behavior of four statistical features, including the packet size, the time interval between sending and receiving packets, the duration of the flows and the rate of sending packets, are investigated; then applications traffics are identified via considering characteristics of their distribution. We collected a database of network traffic flow from seven applications with different rates of packet loss. We used the extracted features in a multilayer neural network, as a classifier, to differentiate between different traffic applications. Experimental results show that the extracted features are robust against the packets loss, and the accuracy of the network traffic identification is close to the ideal state (traffic flow with no packet lost).

    Keywords: Network Traffic, Network traffic Identification, Machine Learning, Packet Loss
  • Hadi Soleimany*, Alireza Mehrdad, Saeideh Sadeghi, Farokhlagha Moazemi Pages 17-26

    Impossible difference attack is a powerful tool for evaluating the security of block ciphers based on finding a differential characteristic with the probability of exactly zero. The linear layer diffusion rate of a cipher plays a fundamental role in the security of the algorithm against the impossible difference attack. In this paper, we show an efficient method, which is independent of the quality of the linear layer, can find impossible differential characteristics of Zorro block cipher. In other words, using the proposed method, we show that, independent of the linear layer feature and other internal elements of the algorithm, it is possible to achieve effective impossible differential characteristic for the 9-round Zorro algorithm. Also, based on represented 9-round impossible differential characteristic, we provide a key recovery attack on reduced 10-round Zorro algorithm. In this paper, we propose a robust and different method to find impossible difference characteristics for Zorro cipher, which is independent of the linear layer of the algorithm. The main observation in this method is that the number of possible differences in that which may occur in the middle of Zorro algorithm might be very limited. This is due to the different structure of Zorro. We show how this attribute can be used to construct impossible difference characteristics. Then, using the described method, we show that, independent of the features of the algorithm elements, it is possible to achieve efficient 9-round impossible differential characteristics of Zorro cipher. It is important to note that the best impossible differential characteristics of the AES encryption algorithm are only practicable for four rounds. So the best impossible differential characteristic of Zorro cipher is far more than the best characteristic of AES, while both algorithms use an equal linear layer. Also, the analysis presented in the article, in contrast to previous analyzes, can be applied to all ciphers with the same structure as Zorro, because our analysis is independent of the internal components of the algorithm. In particular, the method presented in this paper shows that for all Zorro modified versions, there are similarly impossible differential characteristics. Zorro cipher is a block cipher algorithm with 128-bit block size and 128-bit key size. Zorro consists of 6 different sections, each with 4 rounds (24 rounds in all). Zorro does not have any subkey production algorithm and the main key is simply added to the value of the beginning state of each section using the XOR operator. Internal rounds of one section do not use the key. Similar to AES, Zorro state matrix can be shown by a 4 × 4 matrix, which each of these 16 components represent one byte. One round of Zorro, consists of four functions, which are SB*, AC, SR, and MC, respectively. The SB* function is a nonlinear function applying only to the four bytes in the first row of the state matrix. Therefore, in the opposite of the AES, where the substitution box is applied to all bytes, the Zorro substitution box only applies to four bytes. The AC operator is to add a round constant. Finally, the two SR and MC transforms are applied to the state matrix, which is, respectively, the shift row and mixed column used in the AES standard algorithm. Since the analyzes presented in this article are independent of the substitution properties, we do not use the S-box definition used by Zorro. Our proposed model uses this Zorro property that the number of possible differences after limited rounds can be much less than the total number of possible differences. In this paper, we introduce features of the Zorro, which can provide a high bound for the number of possible values of an intermediate difference. We will then present a model for how to find Zorro impossible differential characteristics, based on the limitations of the intermediate differences and using the miss-in-the-middle attack. Finally, we show that based on the proposed method, it is possible to find an impossible differential characteristic for 9 rounds of algorithms with a Zorro-like structure and regardless of the linear layer properties. Also, it is possible to apply the key recovery attack on 10 rounds of the algorithm. So, regardless of the features of the used elements, it can be shown that this number of round of algorithms is not secure even by changing the linear layer.

    Keywords: block cipher, cryptanalysis, impossible difference attack, Zorro block cipher algorithm
  • Seyyed Masoud Ejabati*, Seyed Hamid Zahiri Pages 27-44

    In the real world, many of the optimization issues are dynamic, uncertain, and complex in which the objective function or constraints can be changed over time. Consequently, the optimum of these issues is changed nonlinearly. Therefore, the optimization algorithms not only should search the global optimum value in the space but also should follow the path of optimal change in dynamic environment. Accordingly, several researchers believe in the effectiveness of following a series of optimums compared to a global optimum. Therefore, when an environment is changed, following a global optimum in a series of best optimums is more efficient. Evolutionary algorithms (EA) were inspired by biological and natural evolution. Because of changing characteristic of nature, it can be a good option for dynamic optimization. In recent years, different methods have been proposed to improve EA of static environments. One of the most common methods is multi-population method. In this method, the whole space is divided into sub-spaces. Each sub-space covers some local optimums and represents a sub-population. The algorithm updates the particles of each sub-space and searches the best optimum. The most challenging issue of multi-population method is to create the desired number of sub-population and people to cover different sub-spaces in the search space. In the present study, in order to deal with the challenges, a new algorithm based on particle optimization algorithm, which is called decrement and increment particle optimization algorithm, was proposed. The algorithm is able to follow and find the number of time-varied optimum in an environment with invisible changes by increasing or decreasing the number of particles adaptively. Another challenging issue in dynamic optimization is the detection of environmental changes, due to the impossibility of this issue and failure of detection-based algorithms.  In the proposed method, there is no need to detect the environmental changes and it always adapts itself to the environment. Furthermore, the terms of focused search area were defined to emphasize on promising spaces to accelerate the local search process and prevent early convergence. The results of the proposed algorithm were evaluated on moving peaks and compared with several valid algorithms. The results showed the positive effect of decrement/increment mechanism of particles on finding and following time of many optimums compared to other multi-population based optimization algorithm.

    Keywords: Increase decrease particle, Dynamic optimization problems (DOPs), Multi-population approach, Particle swarm optimization
  • Alireza Eshaghpoor, Mostafa Salehi*, Vahid Ranjbar Pages 45-58

    In recent years, with the growing number of online social networks, these networks have become one of the best markets for advertising and commerce, so studying these networks is very important. Most online social networks are growing and changing with new communications (new edges). Forecasting new edges in online social networks can give us a better understanding of the growth of these networks. Link prediction has many important applications. These include predicting future social networking interactions, the ability to manage and design useful organizational communications, and predicting and preventing relationships in terrorist gangs. There have been many studies of link prediction in the field of engineering and humanities. Scientists attribute the existence of a new relationship between two individuals for two reasons: 1) Proximity to the graph (structure) 2) Similar properties of the two individuals (Homophile law). Based on the two approaches mentioned, many studies have been carried out and the researchers have presented different similarity metrics for each category. However, studying the impact of the two approaches working together to create new edges remains an open problem. Similarity metrics can also be divided into two categories; Neighborhood-based and path-based. Neighborhood-based metrics have the advantage that they do not need to access the whole graph to compute, whereas the whole graph must be available at the same time to calculate path-based metrics. So far, above the two theoretical approaches (proximity and homophile) have not been found together in the neighborhood-based metrics. In this paper, we first attempt to provide a solution to determine importance of the proximity to the graph and similar features in the connectivity of the graphs. Then obtained weights are assigned to both proximity and homophile. Then the best similarity metric in each approach are obtained. Finally, the selected metric of homophily similarity and structural similarity are combined with the obtained weights. The results of this study were evaluated on two datasets; Zanjan University Graduate School of Social Sciences and Pokec online Social Network. The first data set was collected for this study and then the questionnaires and data collection methods were filled out. Since this dataset is one of the few Iranian datasets that has been compiled with its users' specifications, it can be of great value. In this paper, we have been able to increase the accuracy of Neighborhood-based similarity metric by using two proximity in graph and homophily approaches.

    Keywords: Link prediction, Homophily similarity, Network similarity, Social networks
  • Mehdi Banitalebi Dehkordi, Abbas Ebrahimi Moghadam*, Morteza Khademi, Hadi Hadizadeh Pages 59-72

    When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the scene at order. This results in a better and faster perception of events. In order to perform saliency measurement on visual data, subjective eye-tracking experiments may be carried out. These experiments involve using devices to track eye movements of a number of subjects while they watch images or videos on a screen. That being said, such devices are not very suitable in practice due to hardship involved with carrying out experiments, such as need to have restricted test environment, being time consuming as well as expensive. Instead, researchers developed Computational Visual Attention Models (VAMs) in attempts to mimic the HVS saliency prediction process. Visual Attention Modelling has widely been used in various areas of image processing and understanding. Computational models of visual attention aim to predict the most interesting areas of an image to the observers. To this end, these models produce saliency maps, in which each pixel is assigned a likelihood value of being looked at. In other words, saliency maps highlight where the most likely for viewers  to look at in an image is. Knowing the Regions of Interests (ROIs) can be helpful in applications such as image and video compression, object recognition and detection, visual search, retargeting, retrieval, image matching, and segmentation. Saliency prediction is generally done in a bottom-up, top-down, or hybrid fashion. Bottom-up approaches exploit low-level attributes such as brightness, color, edges, texture, etc. Top-down approaches focus on context-dependent information from the scene such as appearance of humans, animals, text, etc. Hybrid methods combine the two streams. This paper proposes a new method of saliency prediction using sparse wavelet coefficients selected from low-level bottom-up saliency features. Wavelet based image methods are used widely in image processing algorithms as they are especially powerful in decomposing images into several scales of resolutions. In our method, first random compressive sampling is performed on wavelet coefficients in the Lab color space. Random sampling enables a reduction in computational complexity and provides a sparse representation of the coefficients. The number of decomposition levels is chosen based on the information diffusion property of the signal. In the proposed method, the sampling can be done at a rate different than the Nyquist rate, and based on the sparsity degree of the signal. It is shown that having the basis vectors of a sparse representation of the signal, can result in an accurate signal reconstruction. In this work, the sparsity degree and thus the sampling rate is computed empirically. Next, local and global saliency maps are generated from these random samples to account for small-scale and large-scale (scene-wide) saliency attributes. These maps are then combined to form an overall saliency map. The overall saliency map therefore includes both local, and global saliency attributes. The main contribution of this paper is the use of compressive sampling in creating a novel wavelet domain representation for image saliency prediction. Extensive performance evaluations show that the proposed method provides a promising saliency prediction performance while the computation complexity remains reasonable, thanks to the dimensionality reduction of compressive sampling. In particular, the proposed method demonstrated favorable precision, recall, and F-measure, when compared to state-of-the-art saliency detection methods, over large-scale datasets. We hope the proposed approach brings ideas to the saliency analysis research community.

    Keywords: Saliency map, visual attention, wavelet transform, sparsity, compressive sampling
  • Reza Mozaffari, Samira Mavaddati* Pages 73-92

    In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should be considered in the dictionary learning process: Atom-data coherence and mutual coherence between dictionary atoms. The first one determines the dependency between the dictionary atoms and training data frames. This criterion value should be high. Another parameter expresses the dependency between atoms defined as the maximum absolute value of the cross-correlations between them. Higher coherence to the data class and lower mutual coherence between atoms result in a small approximation error in sparse coding procedure. In the proposed dictionary learning process, a coherence criterion is employed to yield over complete dictionaries with the incoherent atoms. The purpose of learning dictionary with low mutual coherence value is to reduce the approximation error of sparse representation in the denoising process and also decrease the computing time. We utilize the least angle regression with coherence criterion (LARC) algorithm for sparse representation based on atom-data coherence in the first step of dictionary learning process. LARC sparse coding is an optimized generalization of the least angle regression algorithm with stopping condition based on a residual coherence. This approach is based on setting a variable cardinality value. Using atom-data coherence measure as stopping criteria in the sparse coding process yields the capability of balancing between source confusion and source distortion. A high value for the cardinality parameter or too dense coding results in the source confusion since the number of dictionary atoms is more than what is required for a proper representation. Source degradation occurs when the sparse coding is done with low cardinality parameter or too sparse coding. Therefore, the number of required atoms will not be enough and data cannot be coded exactly over these atoms. Therefore, the setting procedure of cardinality parameter must be performed precisely. The problem of finding a dictionary with low mutual coherence between its normalized atoms can be obtained by considering the Gram matrix. The mutual coherence is described by the maximum absolute value of the off-diagonal elements of this matrix. If all off-diagonal elements are the same, a dictionary with minimum self-coherence value is obtained. Also, we take advantage of domain adaptation technique to transfer a learned dictionary to an adapted dictionary in the denoising process. The initial atoms set randomly and are updated based on the selected patches of input noisy image using the proposed alternating optimization algorithm. According to these issues, the fitness function in dictionary learning problem includes three main sections: The first term is related to the minimization of approximation error. The next items are the incoherence criterion of dictionary atoms. The last one includes a transformation of initial atoms according to some patches of the noisy input data in the test step. We use limited-memory BFGS algorithm as an iterative solution for regular minimization of our objective function involved different terms. The simulation results show that the proposed method leads to significantly better results in comparison with the earlier methods in this context and the traditional procedures.

    Keywords: Image denoising, Dictionary learning, Coherence, Domain adaptation, Image processing
  • Saeedeh Momtazi*, Farzaneh Torabi Pages 93-112

    Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefits from neural network-based approaches for both word representation and entity tagging. In the word representation part of the proposed model, two different vector representations are used and compared: (1) the semantic representation of words based on their context using word2vec continues skip-gram model, and (2) the semantic representation of words based on their context as well as characters forming them using fasttext. While the former model captures the semantic concepts of words, the latter one considers the morphological similarity of words as well. For the entity identification, a deep Bidirectional Long Short Term Memory (BiLSTM) network is used. Using LSTM model helps to consider the history of text when predicting entities, while the BiLSTM model expands this idea by benefiting from the history from both sides of the context. Moreover, inline of the present research, an annotated corpus containing 3000 abstracts (90000 tokens) from the Persian Wikipedia is provided. In contrast to the available datasets in the field, which includes up to 7 label types, the new dataset contains 15 different labels, namely person individual, person group, organizations, locations, religions, books, magazines, movies, languages, nationalities, events, jobs, dates, fields, and other. Developing this dataset will be an important step in promoting future research in this field, especially for the tasks such as question answering that need wider range of entity types. The results of the proposed system show that by using the introduced model and the provided data, the system can achieve 72.92 F-measure.

    Keywords: Name entity recognition, natural language processing, word embedding, deep learning
  • Amir Soltany Mahboob*, Seyed Hamid Zahiri Mamaghani Pages 113-134

    ANFIS systems have been much considered due to their acceptable performance in terms of creation of fuzzy classifier and training. One main challenge in designing an ANFIS system is to achieve an efficient method with high accuracy and appropriate interpreting capability. Undoubtedly, type and location of membership functions and the way an ANFIS network is trained are of considerable effect on its performance. Up to present time, related researches have just found type and location of membership functions, and or suggested methods to train these networks. Main reason for lack of simultaneous determination of type and location of membership functions and training an ANFIS network is the length of standard versions of Heuristic methods being fixed. In this paper, a new version of optimization method of inclined planes will be introduced, primarily; while search factors could be variable. Then, achieved capability will be used for specifying type and location of membership functions and simultaneous training of a classifier based on adaptive neuro-fuzzy inference system (ANFIS). The proposed method on five benchmark datasets iris, Breast Cancer, Bupa Liver, Wine and Pima from the UCI database has been tested, which has different number of reference classes, different length of attribute vectors with appropriate complexity. Initially, the accuracy of the test dataset for each of the selected datasets was compared using the standard 10 folded cross validation method using the standardized version of the standard length.Then the same experiments were repeated by the proposed method and the results of applying the proposed method on the five aforementioned datasets were compared with the results of the heuristic methods with the standard length version. The comparative results show that the optimal and intelligent design of ANFIS classifier by variable length heuristics on five well-known datasets yields good and satisfactory results and in each of the five problems it has provided better answers than other design methods in the ANFIS classification system.

    Keywords: Pattern Recognition, Classifier, adaptive neuro fuzzy inference system, variable Length Inclined Planes System Optimization algorithm
  • Vahid Sadeghi* Pages 135-150

    Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retracted to a non-final position in words containing enclitic affixes. The present research explores the question as to whether Persian listeners are able to identify word boundaries given the tonal structure of words in Persian phonology or not. The paper was also intended to investigate to what extent Persian native speakers use H peaks to identify word stress pattern. Two perceptual experiments were conducted in this regard. Given the tonal structure of words in utterance non-final position in Persian, it was hypothesized that listeners are likely to identify the end of a high plateau as a cue to word boundary. In addition, given that peaks in utterance non-final position are delayed, it was further hypothesized that perceived prominent is likely to be attributed to a syllable that precedes another syllable carrying a pitch peak. The basic stimulus for the first experiment was a nonsense sequence of nine “dA” syllables with equal duration ([dA1.dA2.dA3.dA4.dA5.dA6.dA7.dA8.dA9]) across the syllables. The peak was located at the beginning of the consonant in [dA4] in the stimulus. The duration of the H plateau following the H peak was varied continuously to create 6 different stimuli with varying temporal plateau. The stimuli were presented randomly to 10 native speakers of Persian. The participants were asked to chunk the sequence of identical syllables they hear into two parts as if they were two independent words. They were also asked to identify the most prominent syllable in a separate identification test. The results showed that the ending point of a high H plateau acts as a prosodic cue to word boundary detection in Persian. For example, when the end of the H plateau was located on the end of the vowel in dA4, listeners identified the end of dA4 as boundary between two hypothetical words. However, when the end of the plateau was located on the end of the vowel in dA5 or the beginning of the consonants in .dA6 listeners identified the end of dA5 as the word final boundary. The results of this experiment further revealed that listeners are sensitive to the position of H peaks to identify within-word position of prominence in Persian. Listeners consistently identified dA3 as the most prominent syllable as this syllable preceded dA4 on which the peak was located, and the rate of their identification was not affected by the duration of H plateau following the pitch peak.In the second experiment, listeners’ ability to use F0 contour as a cue to word boundary was tested on resynthesized speech in which the spectral properties of the signals were intentionally deformed. The results replicated the findings previously obtained for the first experiment, indicating that the end of a high plateau acts as a robust cue to word boundary detection in Persian.

    Keywords: word boundary_intonation_prosodic prominence_H plateau_position of H peaks
  • Saeed Rouhani*, Tahereh Pezeshki, Babak Sohrabi Pages 151-164

    One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholars. The impact of big data on information analysis can be traced to four different parts. The first part is data extraction and processing, the second part is data analysis, the third part is data storage, and finally the visualization of the data. In the field of big data processing, in various studies, different categories have been presented. For example, in the studies of Hashim et al., big data processing is divided into two categories. These two types are: batch and real time. These two categories of processing, which nowadays are standard in any comprehensive big data solution, also have been introduced in Abawajy studies: batch processing is related to offline processing, and real-time processing is usually used to analyze the streaming data without any need to storage of data on disk. As data flows from various sources, the data is analyzed and processed real time, for immediate insight. As today's world is rapidly changing and survival in today's competitive world requires instant decision-making based on flows of data, streaming data analysis is becoming increasingly important. On the other hand, one of the great valuable sources of streaming data is the data generated by social networks’ users such as Twitter. Social networks data sources are very rich sources for analysis as they come from the opinions and opinions of their users. As discussed earlier, and since previous studies such as Flash's studies have focused more on batch analysis (offline data), this study has attempted to investigate a variety of tools and infrastructures related to big streaming data, and finally design a real-time dashboard based on Twitter social network streaming data. The following article addresses two research questions: 1) How to design and implement a real-time dashboard based on social networks data? 2) Which different configurations are best suited for real-time dashboard analysis and visualization? In other words, the purpose of this article is to provide a solution for extracting and visualizing Twitter's social network streaming data by deleting databases, as an examples of big data real time analysis. In this research, we used Twitter streaming data as an input, Apache Storm as a processing platform and D3.js as a visualization tool. Finally, the designed dashboard was evaluated using Design of Experiment method and other statistical tests in various types of Apache Storm configurations and eventually it was proved that the dashboard is real time with an average response time for 1 minute and 30 seconds.

    Keywords: Big data, visualization, real time dashboard