SAM-Net: semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications.

Yang, Binchao; Xu, Xinying; Ren, Jinchang; Cheng, Lan; Guo, Lei; Zhang, Zhe

doi:10.1016/j.patrec.2021.11.028

SAM-Net: semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications.

Yang, Binchao; Xu, Xinying; Ren, Jinchang; Cheng, Lan; Guo, Lei; Zhang, Zhe

Authors

Binchao Yang

Xinying Xu

Professor Jinchang Ren j.ren@rgu.ac.uk
Professor of Computing Science

Lan Cheng

Lei Guo

Zhe Zhang

Abstract

3D scene understanding is an essential research topic in the field of Visual Odometry (VO). VO is usually built under the assumption of a static environment, which does not always hold in real scenarios. Existing works fail to consider the dynamic objects, leading to poor performance. To tackle the aforementioned issues, we propose a self-supervised learning-based VO framework with Semantic probabilistic and Attention Mechanism, SAM-Net, which can jointly learn the single view depth, the ego motion of camera and object detection. For depth estimation, semantic probabilistic fusion mechanism is employed to detect the dynamic objects and generate the semantic probability map as a prior before feeding it to the network to generate a more refined depth map, and attention mechanism is explored to enhance perception ability in spatial and channel view. For pose estimation, we present a novel PoseNet with the atrous separable convolution to expand receptive field. And the photometric consistency loss is employed to alleviate the impact of large rotations. Intensive experiments on the KITTI dataset demonstrate that the proposed approach achieves excellent performance in terms of pose and depth accuracy.

Citation

YANG, B., XU, X., REN, J., CHENG, L. GUO, L. and ZHANG, Z. 2022. SAM-Net: semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications. Pattern recognition letters [online], 153, pages 126-135. Available from: https://doi.org/10.1016/j.patrec.2021.11.028

Journal Article Type	Article
Acceptance Date	Nov 30, 2021
Online Publication Date	Dec 3, 2021
Publication Date	Jan 31, 2022
Deposit Date	Aug 9, 2022
Publicly Available Date	Dec 4, 2022
Journal	Pattern Recognition Letters
Print ISSN	0167-8655
Electronic ISSN	1872-7344
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	153
Pages	126-135
DOI	https://doi.org/10.1016/j.patrec.2021.11.028
Keywords	Artificial intelligence; Computer vision and pattern recognition; Signal processing; Software; Visual odometry; Self-supervised deep learning; Object detection; Semantic probabilistic map; Attention mechanism
Public URL	https://rgu-repository.worktribe.com/output/1545070