Skip to main content

Research Repository

Advanced Search

Enrich, distill and fuse: generalized few-shot semantic segmentation in remote sensing leveraging foundation model's assistance.

Gao, Tianyi; Ao, Wei; Wang, Xing Ao; Zhao, Yuanhao; Ma, Ping; Xie, Mengjie; Fu, Hang; Ren, Jinchang; Gao, Zhi

Authors

Tianyi Gao

Wei Ao

Xing Ao Wang

Yuanhao Zhao

Mengjie Xie

Hang Fu

Zhi Gao



Abstract

Generalized few-shot semantic segmentation (GFSS) unifies semantic segmentation with few-shot learning, showing great potential for Earth observation tasks under data scarcity conditions, such as disaster response, urban planning, and natural resource management. GFSS requires simultaneous prediction for both base and novel classes, with the challenge lying in balancing the segmentation performance of both. Therefore, this paper introduces a novel framework named FoMA, Foundation Model Assisted GFSS framework for remote sensing images. We aim to leverage the generic semantic knowledge inherited in foundation models. Specifically, we employ three strategies named Support Label Enrichment (SLE), Distillation of General Knowledge (DGK) and Voting Fusion of Experts (VFE). For the support images, SLE explores credible unlabeled novel categories, ensuring that each support label contains multiple novel classes. For the query images, DGK technique allows an effective transfer of generalizable knowledge of foundation models on certain categories to the GFSS learner. Additionally, VFE strategy integrates the zero-shot prediction of foundation models with the few-shot prediction of GFSS learners, achieving improved segmentation performance. Extensive experiments and ablation studies conducted on the OpenEarthMap few-shot challenge dataset demonstrate that our proposed method achieves state-of-the-art performance.

Citation

GAO, T., AO, W., WANG, X.-A., ZHAO, Y., MA, P., XIE, M., FU, H., REN, J. and GAO, Z. 2024. Enrich, distill and fuse: generalized few-shot semantic segmentation in remote sensing leveraging foundation model’s assistance. In Proceedings of the 2024 IEEE (Institute of Electrical and Electronics Engineers) Computer Society conference on Computer vision and pattern recognition workshops (CVPRW 2024), 16-22 June 2024, Seattle, WA, USA. Piscataway: IEEE [online], pages 2771-2780. Available from: https://doi.org/10.1109/CVPRW63382.2024.00283

Presentation Conference Type Conference Paper (published)
Conference Name 2024 IEEE (Institute of Electrical and Electronics Engineers) Computer Society conference on Computer vision and pattern recognition workshops (CVPRW 2024)
Start Date Jun 16, 2024
End Date Jun 22, 2024
Acceptance Date Feb 26, 2024
Online Publication Date Jun 22, 2024
Publication Date Dec 31, 2024
Deposit Date Jan 21, 2025
Publicly Available Date Jan 21, 2025
Print ISSN 2160-7508
Electronic ISSN 2160-7516
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Peer Reviewed Peer Reviewed
Pages 2771-2780
Series ISSN 2160-7508; 2160-7516
DOI https://doi.org/10.1109/CVPRW63382.2024.00283
Keywords Natural resources; Image recognition; Annotations; Fuses; Semantic segmentation; Urban planning; Semantics
Public URL https://rgu-repository.worktribe.com/output/2619966

Files

GAO 2024 Enrich distill and fuse (AAM) (2.4 Mb)
PDF

Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/

Copyright Statement
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.




You might also like



Downloadable Citations