Skip to main content

Research Repository

Advanced Search

Unsupervised domain adaptation for VHR urban scene segmentation via prompted foundation model-based hybrid training joint-optimized network.

Lyu, Shuchang; Zhao, Qi; Sun, Yaxuan; Cheng, Guangliang; He, Yiwei; Wang, Guangbiao; Ren, Jinchang; Shi, Zhenwei

Authors

Shuchang Lyu

Qi Zhao

Yaxuan Sun

Guangliang Cheng

Yiwei He

Guangbiao Wang

Zhenwei Shi



Abstract

Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation (UDA-RSSeg) is to adapt a model trained on the source domain data to the target domain samples, thereby minimizing the need for annotated data across diverse remote sensing scenes. In urban planning and monitoring, the task of UDA-RSSeg on Very-High-Resolution (VHR) images has garnered significant research interest. While recent deep learning techniques have demonstrated huge success in tackling the UDA-RSSeg task for VHR urban scenes, a persistent challenge in addressing the domain shift issue remains. Specifically, there are two primary problems: (1) severe inconsistencies in feature representation across diverse domains, characterized by notably differing data distributions, and (2) the domain gap problem due to the representation bias of the source domain patterns when translating features to predictive logits. To solve these problems, we propose a prompted foundation model based hybrid training joint-optimized network (PFM-JONet) for UDA-RSSeg on VHR urban scene. Our approach integrates the notable "Segment Anything Model" (SAM) as prompted foundation model to leverage its robust generalized representation capabilities, thereby alleviating feature inconsistencies. Based on the feature extracted by SAM-Encoder, we introduce a mapping decoder designed to convert SAM-Encoder features into predictive logits. Additionally, a prompted segmentor is employed to generate class-agnostic maps, which guide the mapping decoder’s feature representations. To efficiently optimize the entire network in an end-to-end manner, we design a hybrid training scheme that integrates feature-level and logits-level adversarial training strategies alongside a self-training mechanism. This scheme enhances the model from diverse, compatible perspectives. To evaluate the performance of our proposed PFM-JONet, we conduct extensive experiments on urban scene benchmark datasets, including ISPRS (Potsdam/Vaihingen) and CITY-OSM (Paris/Chicago). On ISPRS dataset, PFM-JONet surpasses previous SOTA methods by 1.60% in mean IoU value across four adaptation tasks. For CITY-OSM's adaptation task, it outperforms SOTA by 4.84% in mean IoU value. These results demonstrate the effectiveness of our method. Furthermore, visualization and analysis reinforce the method's interpretability. The code of this paper is available at https://github.com/CV-ShuchangLyu/PFM-JONet.

Citation

LYU, S., ZHAO, Q., SUN, Y., CHENG, G., HE, Y., WANG, G., REN, J. and SHI, Z. 2025. Unsupervised domain adaptation for VHR urban scene segmentation via prompted foundation model based hybrid training joint-optimized network. IEEE transactions on geoscience and remote sensing [online], 63, article number 4409117. Available from: https://doi.org/10.1109/tgrs.2025.3564216

Journal Article Type Article
Acceptance Date Apr 22, 2025
Online Publication Date Apr 24, 2025
Publication Date Dec 31, 2025
Deposit Date May 5, 2025
Publicly Available Date May 5, 2025
Journal IEEE transactions on geoscience and remote sensing
Print ISSN 0196-2892
Electronic ISSN 1558-0644
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Peer Reviewed Peer Reviewed
Volume 63
Article Number 4409117
DOI https://doi.org/10.1109/TGRS.2025.3564216
Keywords Unsupervised domain adaptation; Semantic segmentation; Hybrid training; Prompted foundation model; Ver-high-resolution images; Urban scene
Public URL https://rgu-repository.worktribe.com/output/2801786

Files




You might also like



Downloadable Citations