In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Co...In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.展开更多
Conventional soil maps contain valuable knowledge on soil–environment relationships.Such knowledge can be extracted for use when updating conventional soil maps with improved environmental data.Existing methods take ...Conventional soil maps contain valuable knowledge on soil–environment relationships.Such knowledge can be extracted for use when updating conventional soil maps with improved environmental data.Existing methods take all polygons of the same map unit on a map as a whole to extract the soil–environment relationship.Such approach ignores the difference in the environmental conditions represented by individual soil polygons of the same map unit.This paper proposes a method of mining soil–environment relationships from individual soil polygons to update conventional soil maps.The proposed method consists of three major steps.Firstly,the soil–environment relationships represented by each individual polygon on a conventional soil map are extracted in the form of frequency distribution curves for the involved environmental covariates.Secondly,for each environmental covariate,these frequency distribution curves from individual polygons of the same soil map unit are synthesized to form the overall soil–environment relationship for that soil map unit across the mapped area.And lastly,the extracted soil–environment relationships are applied to updating the conventional soil map with new,improved environmental data by adopting a soil land inference model(SoLIM)framework.This study applied the proposed method to updating a conventional soil map of the Raffelson watershed in La Crosse County,Wisconsin,United States.The result from the proposed method was compared with that from the previous method of taking all polygons within the same soil map unit on a map as a whole.Evaluation results with independent soil samples showed that the proposed method exhibited better performance and produced higher accuracy.展开更多
Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil...Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil texture, soil salinity varies with short-term time. Thus, how to choose powerful environmental predictors is especially important for soil salinity. This paper presents a similarity-based prediction approach to map soil salinity and detects powerful environmental predictors for the Huanghe(Yellow) River Delta area in China. The similarity-based approach predicts the soil salinities of unsampled locations based on the environmental similarity between unsampled and sampled locations. A dataset of 92 points with salt data at depth of 30–40 cm was divided into two subsets for prediction and validation. Topographical parameters, soil textures, distances to irrigation channels and to the coastline, land surface temperature from Moderate Resolution Imaging Spectroradiometer(MODIS), Normalized Difference Vegetation Indices(NDVIs) and land surface reflectance data from Landsat Thematic Mapper(TM) imagery were generated. The similarity-based prediction approach was applied on several combinations of different environmental factors. Based on three evaluation indices including the correlation coefficient(CC) between observed and predicted values, the mean absolute error and the root mean squared error we found that elevation, distance to irrigation channels, soil texture, night land surface temperature, NDVI, and land surface reflectance Band 5 are the optimal combination for mapping soil salinity at the 30–40 cm depth in the study area(with a CC value of 0.69 and a root mean squared error value of 0.38). Our results indicated that the similarity-based prediction approach could be a vital alternative to other methods for mapping soil salinity, especially for area with limited observation data and could be used to monitor soil salinity distributions in the future.展开更多
Gully feature mapping is an indispensable prerequisite for the motioning and control of gully erosion which is a widespread natural hazard. The increasing availability of high-resolution Digital Elevation Model(DEM) a...Gully feature mapping is an indispensable prerequisite for the motioning and control of gully erosion which is a widespread natural hazard. The increasing availability of high-resolution Digital Elevation Model(DEM) and remote sensing imagery, combined with developed object-based methods enables automatic gully feature mapping. But still few studies have specifically focused on gully feature mapping on different scales. In this study, an object-based approach to two-level gully feature mapping, including gully-affected areas and bank gullies, was developed and tested on 1-m DEM and Worldview-3 imagery of a catchment in the Chinese Loess Plateau. The methodology includes a sequence of data preparation, image segmentation, metric calculation, and random forest based classification. The results of the two-level mapping were based on a random forest model after investigating the effects of feature selection and class-imbalance problem. Results show that the segmentation strategy adopted in this paper which considers the topographic information and optimal parameter combination can improve the segmentation results. The distribution of the gully-affected area is closely related to topographic information, however, the spectral features are more dominant for bank gully mapping. The highest overall accuracy of the gully-affected area mapping was 93.06% with four topographic features. The highest overall accuracy of bank gully mapping is 78.5% when all features are adopted. The proposed approach is a creditable option for hierarchical mapping of gully feature information, which is suitable for the application in hily Loess Plateau region.展开更多
Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not avail...Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.展开更多
Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-...Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.展开更多
The inhomogeneous and non-flat paleotopography in a depositional landform area profoundly controls the process of modem gully evolution and shapes the structure of a gully network. However, this controlling effect of ...The inhomogeneous and non-flat paleotopography in a depositional landform area profoundly controls the process of modem gully evolution and shapes the structure of a gully network. However, this controlling effect of paleotopography on modem gully evolution is mostly ignored because of the difficulties in paleotopography reconstruction. In this study, loess area in China is selected as case area for its typical depositional landform area and inhomogeneous and non-flat paleotopography during the Quaternary. The paleotopography underlying loess is considered while evaluating its controlling effects on the gully evolutionary process. On the basis of the geophysical prospecting, detailed geological information, and high-resolution digital elevation model, we reconstruct the pre-Quaternary paleotopographic surface in the case area. Comparative analysis is conducted to reveal the modern gully evolution in relation to the paleotopography. Results show that the concave area of the paleotopography acts as the basement of the high-order modern gully evolution in the hilly-gully area, although this concave area can be covered and buried by the loess depositional process during the Quaternary. A significant controlling effect of paleotopography on high-order modern gully evolution can be observed in a depositional landform with a hilly-gully underlying topography, whereas a relatively weak controlling effect exists in a flat underlying topograpnlcal area oecause of the strong horizontal shift effect of gully formation process. Several low-order modern gullies also exist and limit the controlling effect of paleotopography. These results reveal a controlled high-order modern gully evolutionary process and a rather dynamic low-order modem gully evolutionary process in the hilly-gully area. These results also help us understand the variations in different modern gully evolution in relation to paleotopography and the different management schemes for soil conservation and ecological restoration during the gully evolutionary process.展开更多
基金supported by the National Natural Science Foundation of China (41431177 and 41601413)the National Basic Research Program of China (2015CB954102)+1 种基金the Natural Science Research Program of Jiangsu Province, China (BK20150975 and 14KJA170001)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China
文摘In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.
基金supported by the National Natural Science Foundation of China (41431177 and 41422109)the Innovation Project of State Key Laboratory of Resources and Environmental Information System of China (O88RA20CYA)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China
文摘Conventional soil maps contain valuable knowledge on soil–environment relationships.Such knowledge can be extracted for use when updating conventional soil maps with improved environmental data.Existing methods take all polygons of the same map unit on a map as a whole to extract the soil–environment relationship.Such approach ignores the difference in the environmental conditions represented by individual soil polygons of the same map unit.This paper proposes a method of mining soil–environment relationships from individual soil polygons to update conventional soil maps.The proposed method consists of three major steps.Firstly,the soil–environment relationships represented by each individual polygon on a conventional soil map are extracted in the form of frequency distribution curves for the involved environmental covariates.Secondly,for each environmental covariate,these frequency distribution curves from individual polygons of the same soil map unit are synthesized to form the overall soil–environment relationship for that soil map unit across the mapped area.And lastly,the extracted soil–environment relationships are applied to updating the conventional soil map with new,improved environmental data by adopting a soil land inference model(SoLIM)framework.This study applied the proposed method to updating a conventional soil map of the Raffelson watershed in La Crosse County,Wisconsin,United States.The result from the proposed method was compared with that from the previous method of taking all polygons within the same soil map unit on a map as a whole.Evaluation results with independent soil samples showed that the proposed method exhibited better performance and produced higher accuracy.
基金Under the auspices of Special Fund for Ocean Public Welfare Profession Scientific Research(No.201105020)National Natural Science Foundation of China(No.41471178,41023010,41431177)National Key Technology Innovation Project for Water Pollution Control and Remediation(No.2013ZX07103006)
文摘Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil texture, soil salinity varies with short-term time. Thus, how to choose powerful environmental predictors is especially important for soil salinity. This paper presents a similarity-based prediction approach to map soil salinity and detects powerful environmental predictors for the Huanghe(Yellow) River Delta area in China. The similarity-based approach predicts the soil salinities of unsampled locations based on the environmental similarity between unsampled and sampled locations. A dataset of 92 points with salt data at depth of 30–40 cm was divided into two subsets for prediction and validation. Topographical parameters, soil textures, distances to irrigation channels and to the coastline, land surface temperature from Moderate Resolution Imaging Spectroradiometer(MODIS), Normalized Difference Vegetation Indices(NDVIs) and land surface reflectance data from Landsat Thematic Mapper(TM) imagery were generated. The similarity-based prediction approach was applied on several combinations of different environmental factors. Based on three evaluation indices including the correlation coefficient(CC) between observed and predicted values, the mean absolute error and the root mean squared error we found that elevation, distance to irrigation channels, soil texture, night land surface temperature, NDVI, and land surface reflectance Band 5 are the optimal combination for mapping soil salinity at the 30–40 cm depth in the study area(with a CC value of 0.69 and a root mean squared error value of 0.38). Our results indicated that the similarity-based prediction approach could be a vital alternative to other methods for mapping soil salinity, especially for area with limited observation data and could be used to monitor soil salinity distributions in the future.
基金Under the auspices of Priority Academic Program Development of Jiangsu Higher Education Institutions,National Natural Science Foundation of China(No.41271438,41471316,41401440,41671389)
文摘Gully feature mapping is an indispensable prerequisite for the motioning and control of gully erosion which is a widespread natural hazard. The increasing availability of high-resolution Digital Elevation Model(DEM) and remote sensing imagery, combined with developed object-based methods enables automatic gully feature mapping. But still few studies have specifically focused on gully feature mapping on different scales. In this study, an object-based approach to two-level gully feature mapping, including gully-affected areas and bank gullies, was developed and tested on 1-m DEM and Worldview-3 imagery of a catchment in the Chinese Loess Plateau. The methodology includes a sequence of data preparation, image segmentation, metric calculation, and random forest based classification. The results of the two-level mapping were based on a random forest model after investigating the effects of feature selection and class-imbalance problem. Results show that the segmentation strategy adopted in this paper which considers the topographic information and optimal parameter combination can improve the segmentation results. The distribution of the gully-affected area is closely related to topographic information, however, the spectral features are more dominant for bank gully mapping. The highest overall accuracy of the gully-affected area mapping was 93.06% with four topographic features. The highest overall accuracy of bank gully mapping is 78.5% when all features are adopted. The proposed approach is a creditable option for hierarchical mapping of gully feature information, which is suitable for the application in hily Loess Plateau region.
基金supported by grants from the National Natural Science Foundation of China(41431177 and 41871300)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China+4 种基金the Innovation Project of State Key Laboratory of Resources and Environmental Information System(LREIS),China(O88RA20CYA)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province,ChinaSupports to A-Xing Zhu through the Vilas Associate Awardthe Hammel Faculty Fellow Awardthe Manasse Chair Professorship from the University of Wisconsin-Madison。
文摘Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.
基金supported by the National Natural Science Foundation of China (41130530,91325301,41431177,41571212,41401237)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science,Chinese Academy of Sciences (ISSASIP1622)+1 种基金the Government Interest Related Program between Canadian Space Agency and Agriculture and Agri-Food,Canada (13MOA01002)the Natural Science Research Program of Jiangsu Province (14KJA170001)
文摘Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.
基金supported by the National Natural Science Foundation of China(Grant Nos.41601411,41671389,41571383&41271438)AProject Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions-PAPD(Grant No.164320H101)
文摘The inhomogeneous and non-flat paleotopography in a depositional landform area profoundly controls the process of modem gully evolution and shapes the structure of a gully network. However, this controlling effect of paleotopography on modem gully evolution is mostly ignored because of the difficulties in paleotopography reconstruction. In this study, loess area in China is selected as case area for its typical depositional landform area and inhomogeneous and non-flat paleotopography during the Quaternary. The paleotopography underlying loess is considered while evaluating its controlling effects on the gully evolutionary process. On the basis of the geophysical prospecting, detailed geological information, and high-resolution digital elevation model, we reconstruct the pre-Quaternary paleotopographic surface in the case area. Comparative analysis is conducted to reveal the modern gully evolution in relation to the paleotopography. Results show that the concave area of the paleotopography acts as the basement of the high-order modern gully evolution in the hilly-gully area, although this concave area can be covered and buried by the loess depositional process during the Quaternary. A significant controlling effect of paleotopography on high-order modern gully evolution can be observed in a depositional landform with a hilly-gully underlying topography, whereas a relatively weak controlling effect exists in a flat underlying topograpnlcal area oecause of the strong horizontal shift effect of gully formation process. Several low-order modern gullies also exist and limit the controlling effect of paleotopography. These results reveal a controlled high-order modern gully evolutionary process and a rather dynamic low-order modem gully evolutionary process in the hilly-gully area. These results also help us understand the variations in different modern gully evolution in relation to paleotopography and the different management schemes for soil conservation and ecological restoration during the gully evolutionary process.