Decoding memes: a comprehensive analysis of late and early fusion models for explainable meme analysis.

Abdullakutty, Faseela Chakkalakkal; Naseem, Usman

doi:10.1145/3589335.3652504

Decoding memes: a comprehensive analysis of late and early fusion models for explainable meme analysis.

Abdullakutty, Faseela Chakkalakkal; Naseem, Usman

Authors

Faseela Chakkalakkal Abdullakutty

Usman Naseem

Contributors

Tat-Seng Chua
Editor

Chong-Wah Ngo
Editor

Ravi Kumar
Editor

Hady W. Lauw
Editor

Roy Ka-Wei Lee
Editor

Abstract

Memes are important because they serve as conduits for expressing emotions, opinions, and social commentary online, providing valuable insight into public sentiment, trends, and social interactions. By combining textual and visual elements, multi-modal fusion techniques enhance meme analysis, enabling the classification of offensive and sentimental memes effectively. Early and late fusion methods effectively integrate multi-modal data but face limitations. Early fusion integrates features from different modalities before classification. Late fusion combines classification outcomes from each modality after individual classification and reclassifies the combined results. This paper compares early and late fusion models in meme analysis. It showcases their efficacy in extracting meme concepts and classifying meme reasoning. Pre-trained vision encoders, including ViT and VGG-16, and language encoders such as BERT, AlBERT, and DistilBERT, were employed to extract image and text features. These features were subsequently utilized for performing both early and late fusion techniques. This paper further compares the explainability of fusion models through SHAP analysis. In comprehensive experiments, various classifiers such as XGBoost and Random Forest, along with combinations of different vision and text features across multiple sentiment scenarios, showcased the superior effectiveness of late fusion over early fusion.

Citation

ABDULLAKUTTY, F. and NASEEM, U. 2024. Decoding memes: a comprehensive analysis of late and early fusion models for explainable meme analysis. In Chua, T.-S., Ngo, C.-W., Kumar, R., Lauw, H.W. and Lee, R.K.-W. (eds.). WWW'24 companion: companion proceedings of the ACM web conference 2024, 13-17 May 2024, Singapore. New York: ACM [online], pages 1681-1689. Available from: https://doi.org/10.1145/3589335.3652504

Presentation Conference Type	Conference Paper (published)
Conference Name	2024 ACM Web conference (WWW '24)
Start Date	May 13, 2024
End Date	May 17, 2024
Acceptance Date	Mar 4, 2024
Online Publication Date	May 13, 2024
Publication Date	May 31, 2024
Deposit Date	Jun 6, 2024
Publicly Available Date	Jun 6, 2024
Publisher	Association for Computing Machinery (ACM)
Peer Reviewed	Peer Reviewed
Pages	1681-1689
Book Title	WWW'24 companion: companion proceedings of the ACM web conference 2024
DOI	https://doi.org/10.1145/3589335.3652504
Keywords	Explainability; Fusion; Multi-modal meme analysis
Public URL	https://rgu-repository.worktribe.com/output/2368218