Skip to main content

Research Repository

Advanced Search

FusDreamer: label-efficient remote sensing world model for multimodal data classification.

Wang, Jinping; Song, Weiwei; Chen, Hao; Ren, Jinchang; Zhao, Huimin

Authors

Jinping Wang

Weiwei Song

Hao Chen

Huimin Zhao



Abstract

World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, i.e., hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer.

Citation

WANG, J., SONG, W., CHEN, H., REN, J. and ZHAO, H. [2025]. FusDreamer: label-efficient remote sensing world model for multimodal data classification. IEEE transactions on geoscience and remote sensing [online], Early Access. Available from: https://doi.org/10.1109/TGRS.2025.3554862

Journal Article Type Article
Acceptance Date Mar 26, 2025
Online Publication Date Mar 26, 2025
Deposit Date Mar 27, 2025
Publicly Available Date Mar 27, 2025
Journal IEEE transactions on geoscience and remote sensing
Print ISSN 0196-2892
Electronic ISSN 1558-0644
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Peer Reviewed Peer Reviewed
DOI https://doi.org/10.1109/TGRS.2025.3554862
Keywords Multimodal data fusion; World model; Constrastive learning; Diffusion process
Public URL https://rgu-repository.worktribe.com/output/2762032
Additional Information The code used in this article is available from: https://github.com/Cimy-wang/FusDreamer

Files

WANG 2025 FusDreamer (AAM) (9.9 Mb)
PDF

Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/

Copyright Statement
© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.




You might also like



Downloadable Citations