Multiple transfer learning-based multimodal sentiment analysis using weighted convolutional neural network ensemble
Analyzing the opinions of social media users can lead to a correct understanding of their attitude on different topics. The emotions found in these comments, feedback, or criticisms provide useful indicators for many purposes and can be divided into negative, positive, and neutral categories. Sentiment analysis is one of the natural language processing's tasks used in various areas. Some of social media users' opinions is are multimodal and share a combination of multiple media, including text, image and imoji, which provide a useful structure for extracting and better understanding emotions. This paper presents a hybrid transfer learning method using 5 pre-trained models and hybrid convolutional networks for multimodal sentiment analysis. In this method, 2 pre-trained convolutional network-based models are used to extract the properties of images, and 3 other pre-trained models are used to extract the properties of texts and embed words. The extracted features are used in hybrid convolutional networks, and the visual attention mechanism is used to focus on the most important emotional areas of the images and the multi-head attention mechanism is used to highlight the emotional words. The results of the classification of images and texts are combined using the voting technique, and finally the late fusion is used to determine the polarity and the final label. The results of empirical experiments of the proposed model on a standard data set show 96% accuracy.