Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u...Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.展开更多
Building pattern recognition is important for understanding urban forms,automating map generalization,and visualizing 3D city models.However,current approaches based on object-independent methods have limitations in c...Building pattern recognition is important for understanding urban forms,automating map generalization,and visualizing 3D city models.However,current approaches based on object-independent methods have limitations in capturing all visually aware patterns due to the part-based nature of human vision.Moreover,these approaches also suffer from inefficiencies when applying proximity graph models.To address these limitations,we propose a framework that leverages multi-scale data and a knowledge graph,focusing on recognizing C-shaped building patterns.We first employ a specialized knowledge graph to represent the relationships between buildings within and across various scales.Subsequently,we convert the rules for C-shaped pattern recognition and enhancement into query conditions,where the enhancement refers to using patterns recognized at one scale to enhance pattern recognition at other scales.Finally,rule-based reasoning is applied within the constructed knowledge graph to recognize and enrich C-shaped building patterns.We verify the effectiveness of our method using multi-scale data with three levels of detail(LODs)collected from AMap,and our method achieves a higher recall rate of 26.4%for LOD1,20.0%for LOD2,and 9.1%for LOD3 compared to existing methods with similar precisionrates.We,also achieve recognition efficiency improvements of 0.91,1.37,and 9.35 times,respectively.展开更多
Intrinsic image decomposition is an important and long-standing computer vision problem.Given an input image,recovering the physical scene properties is ill-posed.Several physically motivated priors have been used to ...Intrinsic image decomposition is an important and long-standing computer vision problem.Given an input image,recovering the physical scene properties is ill-posed.Several physically motivated priors have been used to restrict the solution space of the optimization problem for intrinsic image decomposition.This work takes advantage of deep learning,and shows that it can solve this challenging computer vision problem with high efficiency.The focus lies in the feature encoding phase to extract discriminative features for different intrinsic layers from an input image.To achieve this goal,we explore the distinctive characteristics of different intrinsic components in the high-dimensional feature embedding space.We define feature distribution divergence to efficiently separate the feature vectors of different intrinsic components.The feature distributions are also constrained to fit the real ones through a feature distribution consistency.In addition,a data refinement approach is provided to remove data inconsistency from the Sintel dataset,making it more suitable for intrinsic image decomposition.Our method is also extended to intrinsic video decomposition based on pixel-wise correspondences between adjacent frames.Experimental results indicate that our proposed network structure can outperform the existing state-of-the-art.展开更多
基金National Key R&D Program of China(No.2022ZD0118401).
文摘Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.
基金supported by The National Natural Science Foundation of China(No.41871378)The Youth Inno-vation Promotion Association Foundation of Chinese Academic of Sciences(No.Y9C0060)+1 种基金Fundamental Research Funds for the Central Universities(No.070323006)State Key Laboratory of Networking and Switching Tech-nology(No.600123442).
文摘Building pattern recognition is important for understanding urban forms,automating map generalization,and visualizing 3D city models.However,current approaches based on object-independent methods have limitations in capturing all visually aware patterns due to the part-based nature of human vision.Moreover,these approaches also suffer from inefficiencies when applying proximity graph models.To address these limitations,we propose a framework that leverages multi-scale data and a knowledge graph,focusing on recognizing C-shaped building patterns.We first employ a specialized knowledge graph to represent the relationships between buildings within and across various scales.Subsequently,we convert the rules for C-shaped pattern recognition and enhancement into query conditions,where the enhancement refers to using patterns recognized at one scale to enhance pattern recognition at other scales.Finally,rule-based reasoning is applied within the constructed knowledge graph to recognize and enrich C-shaped building patterns.We verify the effectiveness of our method using multi-scale data with three levels of detail(LODs)collected from AMap,and our method achieves a higher recall rate of 26.4%for LOD1,20.0%for LOD2,and 9.1%for LOD3 compared to existing methods with similar precisionrates.We,also achieve recognition efficiency improvements of 0.91,1.37,and 9.35 times,respectively.
基金supported by Pan-Third Pole Environment Study for a Green Silk Road of the Strategic Priority Research Program of Chinese Academy of Sciences(XDA20090000)the National Key Research and Development Program of China(2018YFA0606400)+1 种基金Part laboratory and fieldwork costs were supported by the National Natural Science Foundation of China(41772178,91747207,and 41620104007)Field expedition was supported by the Second Tibetan Plateau Scientific Expedition(2019QZKK0601)。
基金supported by the Special Funds for Creative Research(Grant No.2022C61540)the National Natural Science Foundation of China(NSFC,Grant Nos.61972012 and 61732016).
文摘Intrinsic image decomposition is an important and long-standing computer vision problem.Given an input image,recovering the physical scene properties is ill-posed.Several physically motivated priors have been used to restrict the solution space of the optimization problem for intrinsic image decomposition.This work takes advantage of deep learning,and shows that it can solve this challenging computer vision problem with high efficiency.The focus lies in the feature encoding phase to extract discriminative features for different intrinsic layers from an input image.To achieve this goal,we explore the distinctive characteristics of different intrinsic components in the high-dimensional feature embedding space.We define feature distribution divergence to efficiently separate the feature vectors of different intrinsic components.The feature distributions are also constrained to fit the real ones through a feature distribution consistency.In addition,a data refinement approach is provided to remove data inconsistency from the Sintel dataset,making it more suitable for intrinsic image decomposition.Our method is also extended to intrinsic video decomposition based on pixel-wise correspondences between adjacent frames.Experimental results indicate that our proposed network structure can outperform the existing state-of-the-art.