Image-text sentiment analysis model based on cross-modal semantic enhancement and hybrid fusion

Lei Li; Junwen Peng

doi:10.1051/itmconf/20268301004

Open Access

Issue		ITM Web Conf. Volume 83, 2026 2025 International Conference on Information Technology, Education and Management Innovation (ITEMI 2025)


Article Number		01004
Number of page(s)		11
DOI		https://doi.org/10.1051/itmconf/20268301004
Published online		10 March 2026

ITM Web of Conferences 83, 01004 (2026)

Image-text sentiment analysis model based on cross-modal semantic enhancement and hybrid fusion

Lei Li^* and Junwen Peng

School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, Xinjiang China

^* Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

Analyzing sentiment across multiple modes poses a complex challenge, requiring efficient strategies for semantic interaction and feature fusion across modalities. In this paper, we proposed a text-image sentiment classification model that utilizes multi-layer semantic enhancement and hybrid fusion strategy. First, we employed a dual-channel architecture to extract both global and local features from each modality. The self-attention mechanism was then applied to learn the internal associations within the unimodal global features for emotion classification. Concurrently, we utilized the cross-attention mechanism to thoroughly explore the semantic associations between image and text data. Subsequently, the Bidirectional Recurrent Attention Unit (BiRAU) was integrated with the self-attention mechanism to facilitate in-depth feature-level fusion, culminating in emotion prediction. Ultimately, a decision-level fusion of image, text, and multimodal classification results was executed using a dynamic weighting scheme. Experimental results on the TumEmo and MVSA-Single datasets indicate that our model enhances the performance over other related methods.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.