Image matting is to estimate the opacity of foreground objects from an image. A few deep learning based methods have been proposed for image matting and perform well in capturing spatially close information. However, ...Image matting is to estimate the opacity of foreground objects from an image. A few deep learning based methods have been proposed for image matting and perform well in capturing spatially close information. However, these methods fail to capture global contextual information, which has been proved essential in improving matting performance. This is because a matting image may be up to several megapixels, which is too big for a learning-based network to capture global contextual information due to the limit size of a receptive field. Although uniformly downsampling the matting image can alleviate this problem, it may result in the degradation of matting performance. To solve this problem, we introduce a natural image matting with the attended global context method to extract global contextual information from the whole image, and to condense them into a suitable size for learning-based network. Specifically, we first leverage a deformable sampling layer to obtain condensed foreground and background attended images respectively. Then, we utilize a contextual attention layer to extract information related to unknown regions from condensed foreground and background images generated by a deformable sampling layer. Besides, our network predicts a background as well as the alpha matte to obtain more purified foreground, which contributes to better qualitative performance in composition. Comprehensive experiments show that our method achieves competitive performance on both Composition-1k and the alphamatting.com benchmark quantitatively and qualitatively.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62076162the Shanghai Municipal Science and Technology Major Project under Grant Nos.2021SHZDZX0102 and 20511100300.
文摘Image matting is to estimate the opacity of foreground objects from an image. A few deep learning based methods have been proposed for image matting and perform well in capturing spatially close information. However, these methods fail to capture global contextual information, which has been proved essential in improving matting performance. This is because a matting image may be up to several megapixels, which is too big for a learning-based network to capture global contextual information due to the limit size of a receptive field. Although uniformly downsampling the matting image can alleviate this problem, it may result in the degradation of matting performance. To solve this problem, we introduce a natural image matting with the attended global context method to extract global contextual information from the whole image, and to condense them into a suitable size for learning-based network. Specifically, we first leverage a deformable sampling layer to obtain condensed foreground and background attended images respectively. Then, we utilize a contextual attention layer to extract information related to unknown regions from condensed foreground and background images generated by a deformable sampling layer. Besides, our network predicts a background as well as the alpha matte to obtain more purified foreground, which contributes to better qualitative performance in composition. Comprehensive experiments show that our method achieves competitive performance on both Composition-1k and the alphamatting.com benchmark quantitatively and qualitatively.