Speech Emotion Recognition Using Convolutional Neural Network and Data Augmentation Technique

Author(s):

Masoume Shafieian

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

The purpose of speech emotion recognition systems is to create an emotional connection between humans and machine, since recognizing human emotions and goals helps improve interactions between humans and machines. Recognizing emotions through speech has been a challenge for researchers over the past decade. But with advances in artificial intelligence, these challenges have faded. In this study, we took steps to improve the efficiency of these systems by using deep learning methods. In the first step, three-dimensional Convolutional neural networks are used to learn the spectral-temporal Features of speech. In the second step, to strengthen the proposed model, We use the New pyramidal Concatenated three-dimensional Convolutional neural networks, Which is a multi-scale architecture of three-dimensional Convolutional neural networks on input dimensions. Finally, to obtain the ability of learning the spectral-temporal features extracted from the New Pyramidal Concatenated 3D CNN Approach, we used the temporal capsule network, so could be called consider the spatial and temporal relationship of the data. Finally, we named the proposed structure, which is a powerful structure for spectral-temporal feaures, the MSID 3DCNN + Temporal Capsule.The final model has been applied on a combination of two speech and song databases from the RAVDESS database. comparing the results of the proposed model with the conventional models, shows the better performance of our approach. The proposed SER model has achieved an accuracy of 81.77% for six emotional classes by gender.

Keywords:

Speech Emotion Recognition , three-dimensional Convolutional neural network , Temporal Capsule , RAVDESS

Language:

Persian

Published:

Journal of Vibration and Sound, Volume:11 Issue: 21, 2022

Pages:

85 to 98

https://magiran.com/p2509032

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

نشریه صوت و ارتعاش

Journal of Vibration and Sound

دوفصلنامه فنی مهندسی به زبان فارسی و انگلیسی

آخرین شماره | آرشیو

ISSN: 2383-1839 eISSN: 2345-623X

صاحب امتیاز:

انجمن آکوستیک و ارتعاشات ایران

مدیر مسئول:

دکتر حمید مهدیقلی

سردبیر:

دکتر فیروز بختیاری نژاد

تلفن نشریه: ۰۲۱-۸۱۰۳۲۳۲۳

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

به جمع مشترکان مگیران بپیوندید!

Speech Emotion Recognition Using Convolutional Neural Network and Data Augmentation Technique

Masoume Shafieian

Speech Emotion Recognition , three-dimensional Convolutional neural network , Temporal Capsule , RAVDESS

نشریه صوت و ارتعاش

Journal of Vibration and Sound