Jinping Wang
FusDreamer: label-efficient remote sensing world model for multimodal data classification.
Wang, Jinping; Song, Weiwei; Chen, Hao; Ren, Jinchang; Zhao, Huimin
Authors
Weiwei Song
Hao Chen
Professor Jinchang Ren j.ren@rgu.ac.uk
Professor of Computing Science
Huimin Zhao
Abstract
World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, i.e., hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer.
Citation
WANG, J., SONG, W., CHEN, H., REN, J. and ZHAO, H. [2025]. FusDreamer: label-efficient remote sensing world model for multimodal data classification. IEEE transactions on geoscience and remote sensing [online], Early Access. Available from: https://doi.org/10.1109/TGRS.2025.3554862
Journal Article Type | Article |
---|---|
Acceptance Date | Mar 26, 2025 |
Online Publication Date | Mar 26, 2025 |
Deposit Date | Mar 27, 2025 |
Publicly Available Date | Mar 27, 2025 |
Journal | IEEE transactions on geoscience and remote sensing |
Print ISSN | 0196-2892 |
Electronic ISSN | 1558-0644 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Peer Reviewed | Peer Reviewed |
DOI | https://doi.org/10.1109/TGRS.2025.3554862 |
Keywords | Multimodal data fusion; World model; Constrastive learning; Diffusion process |
Public URL | https://rgu-repository.worktribe.com/output/2762032 |
Additional Information | The code used in this article is available from: https://github.com/Cimy-wang/FusDreamer |
Files
WANG 2025 FusDreamer (AAM)
(9.9 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Two-click based fast small object annotation in remote sensing images.
(2024)
Journal Article
Prompting-to-distill semantic knowledge for few-shot learning.
(2024)
Journal Article
Detection-driven exposure-correction network for nighttime drone-view object detection.
(2024)
Journal Article
Feature aggregation and region-aware learning for detection of splicing forgery.
(2024)
Journal Article