| Issue |
ITM Web Conf.
Volume 78, 2025
International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)
|
|
|---|---|---|
| Article Number | 04032 | |
| Number of page(s) | 8 | |
| Section | Foundations and Frontiers in Multimodal AI, Large Models, and Generative Technologies | |
| DOI | https://doi.org/10.1051/itmconf/20257804032 | |
| Published online | 08 September 2025 | |
Application Analysis of Multimodal Models in Hateful Meme Detection
Haide College, Ocean University of China, Qingdao, 266100, China
Hateful memes are internet memes that spread virally by overlaying short text on images, often containing offensive content targeting groups based on gender, religion, race, or other characteristics. Their rapid dissemination and harmful impact make targeted detection critically important. Multimodal models, capable of simultaneously processing images and text, can accurately identify hateful content in memes. This paper analyzes the image-text fusion methods, optimization strategies, and evaluation metrics of multimodal models in hateful meme detection. Results show that incorporating cross-attention mechanisms during the image-text fusion stage effectively captures complementary information between modalities, thereby enhancing downstream task performance. Furthermore, optimization techniques such as multi-task learning and adversarial training can further improve model robustness and detection accuracy. Model distillation techniques enable faster detection with minimal accuracy loss, facilitating the timely identification of newly released hateful memes. In summary, this paper argues that multimodal models hold significant potential for mitigating the spread of online hate and provides theoretical and practical references for related research through an analysis of image-text fusion methods, optimization strategies, and evaluation metrics.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

