Zhenyu Li, Zehui Chen, Xianming Liu, Junjun Jiang. DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation. Machine Intelligence Research, vol. 20, no. 6, pp.837-854, 2023. https://doi.org/10.1007/s11633-023-1458-0
Citation: Zhenyu Li, Zehui Chen, Xianming Liu, Junjun Jiang. DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation. Machine Intelligence Research, vol. 20, no. 6, pp.837-854, 2023. https://doi.org/10.1007/s11633-023-1458-0

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

doi: 10.1007/s11633-023-1458-0
More Information
  • Author Bio:

    Zhenyu Li received the B. Sc. degree in computer science from Harbin Institute of Technology, China in 2021. He is a master student in computer science from Harbin Institute of Technology, China.His research interests include depth estimation and 3D object detection.E-mail: zhenyuli17@hit.edu.cnORCID iD: 0000-0003-2932-9179

    Zehui Chen received the B. Sc. degree in software engineering from Tongji University, China in 2020. He is currently a Ph. D. degree candidate with School of Information Science and Technology, University of Science and Technology of China (USTC), China.His research interests include object detection and multi-modal learning.E-mail: lovesnow@mail.ustc.edu.cn

    Xianming Liu received the B. Sc. M. Sc., and Ph. D. degrees in computer science from Harbin Institute of Technology (HIT), China in 2006, 2008 and 2012, respectively. In 2011, he spent half a year at Department of Electrical and Computer Engineering, McMaster University, Canada, as a visiting student, where he was a post-doctoral fellow from 2012 to 2013. He was a project researcher with National Institute of Informatics (NII), Japan from 2014 to 2017. He is currently a professor with School of Computer Science and Technology, HIT. He was a receipt of the IEEE ICME 2016 Best Student Paper Award.His research interests include object detection and multi-modal learning.E-mail: csxm@hit.edu.cn

    Junjun Jiang received the B. Sc. degree in mathematics from Huaqiao University, China in 2009, and the Ph. D. degree in computer science from Wuhan University, China in 2014. From 2015 to 2018, he was an associate professor with School of Computer Science, China University of Geosciences, China. From 2016 to 2018, he was a project researcher with National Institute of Informatics (NII), Japan. He is currently a professor with School of Computer Science and Technology, Harbin Institute of Technology, China. He won the Best Student Paper Runner-up Award at MMM 2017, the Finalist of the World′s FIRST 10K Best Paper Award at ICME 2017, and the Best Paper Award at IFTC 2018. He received the 2016 China Computer Federation (CCF) Outstanding Doctoral Dissertation Award and 2015 ACM Wuhan Doctoral Dissertation Award, China.His research interests include image processing and computer vision.E-mail: jiangjunjun@hit.edu.cn (Corresponding author)ORCID iD: 0000-0002-5694-505X

  • Received Date: 2023-03-05
  • Accepted Date: 2023-05-26
  • Publish Online: 2023-09-13
  • Publish Date: 2023-12-01
  • This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder architecture consisting of a Transformer branch and a convolution branch. The former can model global context with the effective attention mechanism and the latter aims to preserve the local information as the Transformer lacks the spatial inductive bias in modeling such contents. However, independent branches lead to a shortage of connections between features. To bridge this gap, we design a hierarchical aggregation and heterogeneous interaction module to enhance the Transformer features and model the affinity between the heterogeneous features in a set-to-set translation manner. Due to the unbearable memory cost introduced by the global attention on high-resolution feature maps, we adopt the deformable scheme to reduce the complexity. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins. The effectiveness of each proposed module is elaborately evaluated through meticulous and intensive ablation studies.

     

  • This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
    The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
    To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
    11 http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction
  • loading
  • [1]
    K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.
    [2]
    H. Fu, M. M. Gong, C. H. Wang, K. Batmanghelich, D. C. Tao. Deep ordinal regression network for monocular depth estimation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 2002–2011, 2018. DOI: 10.1109/CVPR.2018.00214.
    [3]
    J. H. Lee, M. K. Han, D. W. Ko, I. H. Suh. From big to small: Multi-scale local planar guidance for monocular depth estimation, [Online], Available: https://arxiv.org.abs/1907.10326, 2019.
    [4]
    S. F. Bhat, I. Alhashim, P. Wonka. AdaBins: Depth estimation using adaptive bins. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 4008–4017, 2021. DOI: 10.1109/CVPR46437.2021.00400.
    [5]
    R. Ranftl, A. Bochkovskiy, V. Koltun. Vision transformers for dense prediction. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 12159–12168, 2021. DOI: 10.1109/ICCV48922.2021.01196.
    [6]
    A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 1161–1168, 2005.
    [7]
    O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: 10.1007/978-3-319-24574-4_28.
    [8]
    L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018. DOI: 10.1109/TPAMI.2017.2699184.
    [9]
    H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, J. Y. Jia. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 6230–6239, 2017. DOI: 10.1109/CVPR.2017.660.
    [10]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
    [11]
    L. Huynh, P. Nguyen-Ha, J. Matas, E. Rahtu, J. Heikkilä. Guiding monocular depth estimation using depth-attention volume. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 581–597, 2020. DOI: 10.1007/978-3-030-58574-7_35.
    [12]
    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, 2021.
    [13]
    G. Yang, H. Tang, M. Ding, N. Sebe, E. Ricci. Transformers solve the limited receptive field for monocular depth prediction. In Proceedings of International Confonference on Computer Vision, 2021.
    [14]
    K. Yuan, S. P.Guo, Z. W.Liu, A. J. Zhou, F. W. Yu, W. Wu. Incorporating convolution designs into visual transformers. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 559–568, 2021. DOI: 10.1109/ICCV48922.2021.00062.
    [15]
    Z. H. Dai, H. X. Liu, Q. V. Le, M. X. Tan. Coatnet: Marrying convolution and attention for all data sizes. In Proceedings of the 35th International Conference on Neural Information Processing Systems, pp. 3965–3977, 2021.
    [16]
    T. Xiao, M. Singh, E. Mintun, T. Darrell, P. Dollár, R. B. Girshick. Early convolutions help transformers see better. In Proceedings of the 35th International Conference on Neural Information Processing Systems, pp. 30392–30400, 2021.
    [17]
    J. F. Dai, H. Z. Qi, Y. W. Xiong, Y. Li, G. D. Zhang, H. Hu, Y. C. Wei. Deformable convolutional networks. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 764–773, 2017. DOI: 10.1109/ICCV.2017.89.
    [18]
    X. Z. Zhu, W. J. Su, L. W. Lu, B. Li, X. G. Wang, J. F. Dai. Deformable detr: Deformable transformers for end-to-end object detection. In Proceedings of 9th International Conference on Learning Representations, 2021.
    [19]
    A. Geiger, P. Lenz, C. Stiller, R. Urtasun. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013. DOI: 10.1177/0278364913491297.
    [20]
    N. Silberman, D. Hoiem, P. Kohli, R. Fergus. Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 746–760, 2012. DOI: 10.1007/978-3-642-33715-4_54.
    [21]
    S. R. Song, S. P. Lichtenberg, J. X. Xiao. SUN RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 567–576, 2015. DOI: 10.1109/CVPR.2015.7298655.
    [22]
    T. W. Hui, C. C. Loy, X. O. Tang. Depth map super-resolution by deep multi-scale guidance. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 353–369, 2016. DOI: 10.1007/978-3-319-46487-9_22.
    [23]
    J. Lee, Y. Kim, S. Lee, B. Kim, J. Noh. High-quality depth estimation using an exemplar 3D model for stereo conversion. IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 7, pp. 835–847, 2015. DOI: 10.1109/TVCG.2015.2398440.
    [24]
    J. X. Dong, J. S. Pan, J. S. Ren, L. Lin, J. H. Tang, M. H. Yang. Learning spatially variant linear representation models for joint filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 8355–8370, 2022. DOI: 10.1109/TPAMI.2021.3102575.
    [25]
    Z. Q. Zhang, X. G. Zhu, Y. W. Li, X. Q. Chen, Y. Guo. Adversarial attacks on monocular depth estimation, [Online], Available: https://arxiv.org/abs/2003.10315, 2020.
    [26]
    D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2366–2374, 2014.
    [27]
    J. J. Hu, M. Ozay, Y. Zhang, T. Okatani. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Waikoloa, USA, pp. 1043–1051, 2019. DOI: 10.1109/WACV.2019.00116.
    [28]
    X. B. Yang, L. Y. Zhou, H. Q. Jiang, Z. L. Tang, Y. B. Wang, H. J. Bao, G. F. Zhang. Mobile3DRecon: Real-time monocular 3D reconstruction on a mobile phone. IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 12, pp. 3446–3456, 2020. DOI: 10.1109/TVCG.2020.3023634.
    [29]
    M. X. Tan, Q. V. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 6105–6114, 2019.
    [30]
    G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2261–2269, 2017. DOI: 10.1109/CVPR.2017.243.
    [31]
    I. Alhashim, P. Wonka. High quality monocular depth estimation via transfer learning, [Online], Available: https://arxiv.org/abs/1812.11941, 2018.
    [32]
    Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, S. Lin, B. N. Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 9992–10002, 2021. DOI: 10.1109/ICCV48922.2021.00986.
    [33]
    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 213–229, 2020. DOI: 10.1007/978-3-030-58452-8_13.
    [34]
    S. X. Zheng, J. C. Lu, H. S. Zhao, X. T. Zhu, Z. K. Luo, Y. B. Wang, Y. W. Fu, J. F. Feng, T. Xiang, P. H. S. Torr, L. Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 6877–6886, 2021. DOI: 10.1109/CVPR46437.2021.00681.
    [35]
    J. B. Jiao, Y. Cao, Y. B. Song, R. Lau. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 55–71, 2018. DOI: 10.1007/978-3-030-01267-0_4.
    [36]
    W. H.Wang, E. Z. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 548–558, 2021. DOI: 10.1109/ICCV48922.2021.00061.
    [37]
    Z. Y. Li, Z. H. Chen, A. Li, L. J. Fang, Q. H. Jiang, X. M. Liu, J. J. Jiang, B. L. Zhou, H. Zhao. SimIPU: Simple 2D image and 3D point cloud unsupervised pre-training for spatial-aware visual representations. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vancouver, Canada, pp. 1500–1508, 2022. DOI: 10.1609/aaai.v36i2.20040.
    [38]
    R. Garg, V. K. B.G., G. Carneiro, I. Reid. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 740–756, 2016. DOI: 10.1007/978-3-319-46484-8_45.
    [39]
    J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, A. Geiger. Sparsity invariant CNNs. In Proceedings of International Conference on 3D Vision, IEEE, Qingdao, China, pp. 11–20, 2017. DOI: 10.1109/3DV.2017.00012.
    [40]
    J. X. Xiao, A. Owens, A. Torralba. SUN3D: A database of big spaces reconstructed using SfM and object labels. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Sydney, Australia, pp. 1625–1632, 2013. DOI: 10.1109/ICCV.2013.458.
    [41]
    A. Janoch, S. Karayev, Y. Q. Jia, J. T. Barron, M. Fritz, K. Saenko, T. Darrell. A category-level 3D object dataset: Putting the kinect to work. Consumer Depth Cameras for Computer Vision, A. Fossati, J. Gall, H. Grabner, X. F. Ren, K. Konolige, Eds., London, UK: Springer, pp. 141–165, 2013. DOI: 10.1007/978-1-4471-4640-7_8.
    [42]
    M. Contributors. MMsegmentation: Openmmlab semantic segmentation toolbox and benchmark, [Online], Available: https://gitee.com/deadkany/mmsegmentation, 2020.
    [43]
    A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012.
    [44]
    B. Y. Li, Y. Huang, Z. Y. Liu, D. P. Zou, W. X. Yu. StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 12643–12653, 2021. DOI: 10.1109/ICCV48922.2021.01243.
    [45]
    P. Ji, R. Z. Li, B. Bhanu, Y. Xu. Monoindoor: Towards good practice of self-supervised monocular depth estimation for indoor environments. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 12767–12776, 2021. DOI: 10.1109/ICCV48922.2021.01255.
    [46]
    I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 4th International Conference on 3D Vision, IEEE, Stanford, USA, pp. 239–248, 2016. DOI: 10.1109/3DV.2016.32.
    [47]
    W. H. Yuan, X. D. Gu, Z. Z. Dai, S. Y. Zhu, P. Tan. Neural window fully-connected CRFs for monocular depth estimation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 3906–3915, 2022. DOI: 10.1109/CVPR52688.2022.00389.
    [48]
    C. Godard, O. M. Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 6602–6611, 2017. DOI: 10.1109/CVPR.2017.699.
    [49]
    A. Johnston, G. Carneiro. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 4755–4764, 2020. DOI: 10.1109/CVPR42600.2020.00481.
    [50]
    Y. K. Gan, X. Y. Xu, W. X. Sun, L. Lin. Monocular depth estimation with affinity, vertical pooling, and label enhancement. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 232–247, 2018. DOI: 10.1007/978-3-030-01219-9_14.
    [51]
    W. Yin, Y. F. Liu, C. H. Shen, Y. L. Yan. Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 5683–5692, 2019. DOI: 10.1109/ICCV.2019.00578.
    [52]
    D. Xu, X. Alameda-Pineda, W. L. Ouyang, E. Ricci, X. G. Wang, N. Sebe. Probabilistic graph attention network with conditional kernels for pixel-wise prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2673–2688, 2022. DOI: 10.1109/TPAMI.2020.3043781.
    [53]
    S. Aich, J. M. U. Vianney, M. A. Islam, M. K. B. Liu. Bidirectional attention network for monocular depth estimation. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Xi′an, China, pp. 11746–11752, 2020. DOI: 10.1109/ICRA48506.2021.9560885.
    [54]
    S. Lee, J. Lee, B. Kim, E. Yi, J. Kim. Patch-wise attention network for monocular depth estimation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 1873–1881, 2021. DOI: 10.1609/aaai.v35i3.16282.
    [55]
    S. Y. Qiao, Y. K. Zhu, H. Adam, A. Yuille, L. C. Chen. ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 3996–4007, 2021. DOI: 10.1109/CVPR46437.2021.00399.
    [56]
    X. T. Chen, X. J. Chen, Z. J. Zha. Structure-aware residual pyramid network for monocular depth estimation. In Proceedings of the 28th International Joint conference on Artificial Intelligence, Macao, China, pp. 694–700, 2019.
    [57]
    A. Kolesnikov, L. Beyer, X. H. Zhai, J. Puigcerver, J. Yung, S. Gelly, N. Houlsby. Big transfer (BiT): General visual representation learning. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 491–507, 2020. DOI: 10.1007/978-3-030-58558-7_29.
    [58]
    A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 1725–1732. DOI: 10.1109/CVPR.2014.223.
    [59]
    J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: 10.1109/CVPR.2018.00745.
    [60]
    S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3–19, 2018. DOI: 10.1007/978-3-030-01234-2_1.
    [61]
    T. Zhou, H. Z. Fu, G. Chen, Y. Zhou, D. P. Fan, L. Shao. Specificity-preserving RGB-D saliency detection. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 4661–4671, 2021. DOI: 10.1109/ICCV48922.2021.00464.
    [62]
    W. B. Zhang, G. P. Ji, Z. Wang, K. R. Fu, Q. J. Zhao. Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In Proceedings of the 29th ACM International Conference on Multimedia, ACM, pp. 731–740, 2021. DOI: 10.1145/3474085.3475240.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(15)  / Tables(13)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (244) PDF downloads(8) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return