Wenpeng Xing, Jie Chen, Yike Guo. Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion. Machine Intelligence Research, vol. 20, no. 3, pp.408-420, 2023. https://doi.org/10.1007/s11633-022-1381-9
Citation: Wenpeng Xing, Jie Chen, Yike Guo. Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion. Machine Intelligence Research, vol. 20, no. 3, pp.408-420, 2023. https://doi.org/10.1007/s11633-022-1381-9

Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion

doi: 10.1007/s11633-022-1381-9
More Information
  • Author Bio:

    Wenpeng Xing received the B. Eng. degree in civil engineering from Harbin Institute of Technology, China in 2017, the G. Dip. in computer engineering from University of Limerick, Ireland in 2019. He is currently a Ph. D. degree candidate in computer science at Department of Computer Science, Hong Kong Baptist University, China. His research interests include computational imaging, 3D content capture, modelling and rendering. E-mail: cswpxing@comp.hkbu.edu.hk ORCID iD: 0000-0001-5848-9417

    Jie Chen received the B. Sc. degree in Opto-Information science and engineering and the M. Eng. degree in optoelectronic information engineering both from School of Optical and Electronic Information, Huazhong University of Science and Technology, China in 2008 and 2011, respectly, and the Ph. D. degree in information engineering from School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore in 2016. He is currently an assistant professor at Department of Computer Science, Hong Kong Baptist University, China. He worked as a post-doctoral research fellow at ST Engineering-NTU Corporate Laboratory, Singapore, and then as a senior algorithm engineer at OmniVision Technologies Inc. He currently serves as an Associate Editor for The Visual Computer, Springer. His research interests include computational photography (light fields, high dynamic range imaging, hyperspectral imaging and computational tomography), multimedia signal capture, reconstruction and content generation (3D vision, motion and music), AI for art-tech and humanities. E-mail: chenjie@comp.hkbu.edu.hk (Corresponding author) ORCID iD: 0000-0001-8419-4620

    Yike Guo received the B. Sc. degree (Hons.) in computing science from Tsinghua University, China in 1985, and the Ph. D. degree in computational logic from the Imperial College London, UK in 1994. He is currently a Chair professor in Department of Computer Science and Engineering, the Hong Kong University of Science and Technology, where he also serves as the provost since December, 2022. He is professor of computing science in Department of Computing at Imperial College London (2002 – now, on leave since 2020 January). Between 2020 and 2022, he was a professor in Department of Computer Science, Hong Kong Baptist University, China, where he served as the vice-president (research and development). He has been working on technology and platforms for scientific data analysis since the mid-1990s, where his research focuses on knowledge discovery, data mining, and large-scale data management. He founded InforSense, a software company for life science and health care data analysis, and served as the CEO for several years before the company′s merger with IDBS, a global advanced R&D software provider, in 2009. He is also the founding director of the Data Science Institute, Imperial College, UK, where he also led Discovery Science Group. He also holds the position of CTO of the tranSMART Foundation, a global open-source community using and developing data sharing and analytics technology for translational medicine. He has contributed to numerous major research projects, including the UK EPSRC platform project, Discovery Net; the Wellcome Trust-funded Biological Atlas of Insulin Resistance (BAIR); and the European Commission U-BIOPRED project. He is also the principal investigator of the European Innovative Medicines Initiative eTRIKS project, a 23 Million Euros project that is building a cloud-based informatics platform, in which tranSMART is a core component for clinicogenomic medical research, and the co-investigator of Digital City Exchange, a 5.9 Million GBP research program, exploring ways to digitally link utilities and services within smart cities. Since 2021, he is the principal coordinator of the 52.8 Million HKD project funded by Hong Kong Research Grants Council which investigates AI-based symbiotic creativity for Art-Tech. He has published over 250 articles, papers, and reports. Projects he has contributed to have been internationally recognized, including winning the “Most Innovative Data Intensive Application Award” at the Supercomputing 2002 conference for Discovery Net, the Bio-IT World “Best Practices Award” for U-BIOPRED in 2014 and the “Best Open Source Software Award” from ACM SIGMM in 2017. He is a Fellow of Royal Academy of Engineering (FREng), Member of Academia Europaea (MAE), Fellow of British Computer Society (FBCS), Fellow of Hong Kong Academy of Engineering Sciences (FHKEng). His research interests include AI for healthcare, data mining and art.E-mail: yikeguo@ust.hk; yikeguo@ust.hk ORCID iD: 0000-0002-3075-2161

  • Received Date: 2022-06-25
  • Accepted Date: 2022-10-18
  • Publish Online: 2023-03-17
  • Publish Date: 2023-06-01
  • Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence. Rendering a locally immersive light field (LF) based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques. In this work, we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view. To fully explore the precious information from source LF captures, we propose a novel occlusion-aware source sampler (OSS) module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner. An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF. The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles, but also proves to be able to effectively enhance the visual rendering quality. Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.

     

  • This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
    The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
    To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
  • loading
  • [1]
    R. C. Bolles, H. H. Baker, D. H. Marimont. Epipolar-plane image analysis: An approach to determining structure from motion. International Journal of Computer Vision, vol. 1, no. 1, pp. 7–55, 1987. DOI: 10.1007/BF00128525.
    [2]
    W. P. Xing, J. Chen, Z. F. Yang, Q. Wang, Y. K. Guo. Scale-consistent fusion: From heterogeneous local sampling to global immersive rendering. IEEE Transactions on Image Processing, vol. 31, pp. 6109–6123, 2022. DOI: 10.1109/TIP.2022.3205745.
    [3]
    R. T. Collins. A space-sweep approach to true multi-image matching. In Proceedings of CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, pp. 358–363, 1996. DOI: 10.1109/CVPR.1996.517097.
    [4]
    D. G. Dansereau, B. Girod, G. Wetzstein. LiFF: Light field features in scale and depth. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 8034–8043, 2019. DOI: 10.1109/CVPR.2019.00823.
    [5]
    N. K. Kalantari, T. C. Wang, R. Ramamoorthi. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics, vol. 35, no. 6, Article number 193, 2016. DOI: 10.1145/2980179.2980251.
    [6]
    P. P. Srinivasan, T. Z. Wang, A. Sreelal, R. Ramamoorthi, R. Ng. Learning to synthesize a 4D RGBD light field from a single image. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2262–2270, 2017. DOI: 10.1109/ICCV.2017.246.
    [7]
    Y. L. Wang, F. Liu, Z. L. Wang, G. Q. Hou, Z. A. Sun, T. N. Tan. End-to-end view synthesis for light field imaging with pseudo 4DCNN. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 340–355, 2018. DOI: 10.1007/978-3-030-01216-8_21.
    [8]
    G. C. Wu, M. D. Zhao, L. Y. Wang, Q. H. Dai, T. Y. Chai, Y. B. Liu. Light field reconstruction using deep convolutional network on EPI. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1638–1646, 2017. DOI: 10.1109/CVPR.2017.178.
    [9]
    H. W. F. Yeung, J. H. Hou, J. Chen, Y. Y. Chung, X. M. Chen. Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 138–154, 2018. DOI: 10.1007/978-3-030-01231-1_9.
    [10]
    Z. T. Zhang, Y. B. Liu, Q. H. Dai. Light field from micro-baseline image pair. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 3800–3809, 2015. DOI: 10.1109/CVPR.2015.7299004.
    [11]
    J. Jin, J. H. Hou, J. Chen, H. Q. Zeng, S. Kwong, J. Y. Yu. Deep coarse-to-fine dense light field reconstruction with flexible sampling and geometry-aware fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1819–1836, 2022. DOI: 10.1109/TPAMI.2020.3026039.
    [12]
    X. Liu, M. H. Wang, A. Z. Wang, X. Y. Hua, S. S. Liu. Depth-guided learning light field angular super-resolution with edge-aware inpainting. The Visual Computer, vol. 38, no. 8, pp. 2839–2851, 2022. DOI: 10.1007/s00371-021-02159-6.
    [13]
    L. Y. Ruan, B. Chen, M. L. Lam. Light field synthesis from a single image using improved Wasserstein generative adversarial network. In Proceedings of the 39th Annual European Association for Computer Graphics Conference: Posters, Delft, The Netherlands, pp. 19–20, 2018.
    [14]
    J. Couillaud, D. Ziou. Light field variational estimation using a light field formation model. The Visual Computer, vol. 36, no. 2, pp. 237–251, 2020. DOI: 10.1007/s00371-018-1599-2.
    [15]
    O. Wiles, G. Gkioxari, R. Szeliski, J. Johnson. SynSin: End-to-end view synthesis from a single image. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 7465–7475, 2020. DOI: 10.1109/CVPR42600.2020.00749.
    [16]
    M. L. Shih, S. Y. Su, J. Kopf, J. B. Huang. 3D photography using context-aware layered depth inpainting. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8025–8035, 2020. DOI: 10.1109/CVPR42600.2020.00805.
    [17]
    R. Tucker, N. Snavely. Single-view view synthesis with multiplane images. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 548–557, 2020. DOI: 10.1109/CVPR42600.2020.00063.
    [18]
    B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, A. Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics, vol. 38, no. 4, Article number 29, 2019. DOI: 10.1145/3306346.3322980.
    [19]
    A. Jain, M. Tancik, P. Abbeel. Putting NeRF on a diet: Semantically consistent few-shot view synthesis. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 5865–5874, 2021. DOI: 10.1109/ICCV48922.2021.00583.
    [20]
    W. P. Xing, J. Chen. NEX.+: Novel view synthesis with neural regularisation over multi-plane images. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, pp. 1581–1585, 2022. DOI: 10.1109/ICASSP43922.2022.9746938.
    [21]
    W. P. Xing, J. Chen. Temporal-MPI: Enabling multi-plane images for dynamic scene modelling via temporal basis learning. In Proceedings of the 17th European Conference on Computer Vision, Springer, Tel Aviv, Israel, pp. 323–338, 2022. DOI: 10.1007/978-3-031-19784-0_19.
    [22]
    B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, R. Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 405–421, 2020. DOI: 10.1007/978-3-030-58452-8_24.
    [23]
    P. Dai, Y. D. Zhang, Z. W. Li, S. C. Liu, B. Zeng. Neural point cloud rendering via multi-plane projection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 7827–7836, 2020. DOI: 10.1109/CVPR42600.2020.00785.
    [24]
    V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, M. Zollhöfer. DeepVoxels: Learning persistent 3D feature embeddings. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 2437–2446, 2019. DOI: 10.1109/CVPR.2019.00254.
    [25]
    I. Choi, O. Gallo, A. Troccoli, M. H. Kim, J. Kautz. Extreme view synthesis. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 7780–7789, 2019. DOI: 10.1109/ICCV.2019.00787.
    [26]
    J. Chibane, A. Bansal, V. Lazova, G. Pons-Moll. Stereo radiance fields (SRF): Learning view synthesis for sparse views of novel scenes. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 7907–7916, 2021. DOI: 10.1109/CVPR46437.2021.00782.
    [27]
    A. P. Chen, Z. X. Xu, F. Q. Zhao, X. S. Zhang, F. B. Xiang, J. Y. Yu, H. Su. MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 14104–14113, 2021. DOI: 10.1109/ICCV48922.2021.01386.
    [28]
    L. Liu, Z. Y. Wang, Y. Liu, C. Xu. An immersive virtual reality system for rodents in behavioral and neural research. International Journal of Automation and Computing, vol. 18, no. 5, pp. 838–848, 2021. DOI: 10.1007/s11633-021-1307-y.
    [29]
    N. N. Zhou, Y. L. Deng. Virtual reality: A state-of-the-art survey. International Journal of Automation and Computing, vol. 6, no. 4, pp. 319–325, 2009. DOI: 10.1007/s11633-009-0319-9.
    [30]
    W. P. Xing, J. Chen. MVSPlenOctree: Fast and generic reconstruction of radiance fields in PlenOctree from multi-view stereo. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, pp. 5114–5122, 2022. DOI: 10.1145/3503161.3547795.
    [31]
    Y. Yao, Z. X. Luo, S. W. Li, T. W. Shen, T. Fang, L. Quan. Recurrent MVSNet for high-resolution multi-view stereo depth inference. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5520–5529, 2019. DOI: 10.1109/CVPR.2019.00567.
    [32]
    Y. Yao, Z. X. Luo, S. W. Li, T. Fang, L. Quan. MVSNet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision, Springer, Munich, Germany, pp. 785–801, 2018. DOI: 10.1007/978-3-030-01237-3_47.
    [33]
    R. Chen, S. F. Han, J. Xu, H. Su. Point-based multi-view stereo network. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 1538–1547, 2019. DOI: 10.1109/ICCV.2019.00162.
    [34]
    J. Chen, J. H. Hou, Y. Ni, L. P. Chau. Accurate light field depth estimation with superpixel regularization over partially occluded regions. IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4889–4900, 2018. DOI: 10.1109/TIP.2018.2839524.
    [35]
    R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, P. Hanrahan. Light Field Photography with A Hand-Held Plenoptic Camera, Ph.D. dissertation, Department of Computer Science, Stanford University, USA, 2005.
    [36]
    Z. H. Yu, S. H. Gao. Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1946–1955, 2020. DOI: 10.1109/CVPR42600.2020.00202.
    [37]
    H. W. F. Yeung, J. H. Hou, X. M. Chen, J. Chen, Z. B. Chen, Y. Y. Chung. Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2319–2330, 2019. DOI: 10.1109/TIP.2018.2885236.
    [38]
    T. Porter, T. Duff. Compositing digital images. ACM SIGGRAPH Computer Graphics, vol. 18, no. 3, pp. 253–259, 1984. DOI: 10.1145/964965.808606.
    [39]
    K. Y. Luo, T. Guan, L. L. Ju, Y. S. Wang, Z. Chen, Y. W. Luo. Attention-aware multi-view stereo. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1587–1596, 2020. DOI: 10.1109/CVPR42600.2020.00166.
    [40]
    P. H. Chen, H. C. Yang, K. W. Chen, Y. S. Chen. MVSNet++: Learning depth-based attention pyramid features for multi-view stereo. IEEE Transactions on Image Processing, vol. 29, pp. 7261–7273, 2020. DOI: 10.1109/TIP.2020.3000611.
    [41]
    X. D. Zhang, Y. T. Hu, H. C. Wang, X. B. Cao, B. C. Zhang. Long-range attention network for multi-view stereo. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, pp. 3781–3790, 2021. DOI: 10.1109/WACV48630.2021.00383.
    [42]
    D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980, 2014.
    [43]
    J. L. Schönberger, J. M. Frahm. Structure-from-motion revisited. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 4104–4113, 2016. DOI: 10.1109/CVPR.2016.445.
    [44]
    C. W. Tian, Y. Xu, Z. Y. Li, W. M. Zuo, L. K. Fei, H. Liu. Attention-guided CNN for image denoising. Neural Networks, vol. 124, pp. 117–129, 2020. DOI: 10.1016/j.neunet.2019.12.024.
    [45]
    Y. Yao, Z. X. Luo, S. W. Li, J. Y. Zhang, Y. F. Ren, L. Zhou, T. Fang, L. Quan. BlendedMVS: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of IEEE/CVF Conference Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1787–1796, 2020. DOI: 10.1109/CVPR42600.2020.00186.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(3)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (174) PDF downloads(5) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return