The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphor...The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphorus (T-P) at a wastewater treatment plant (WWTP). The search range and approach for determining the number of nearest neighbors (NNs) under dry and wet weather conditions were initially optimized based on the root mean square error (RMSE). The optimum search range for considering data size was one year. The square root-based (SR) approach was superior to the distance factor-based (DF) approach in determining the appropriate number of NNs. However, the results for both approaches varied slightly depending on the water quality and the weather conditions. The influent flow rate was accurately predicted within one standard deviation of measured values. Influent water qualities were well predicted with the mean absolute percentage error (MAPE) under both wet and dry weather conditions. For the seven-day prediction, the difference in predictive accuracy was less than 5% in dry weather conditions and slightly worse in wet weather conditions. Overall, the k-NN method was verified to be useful for predicting WWTP influent characteristics.展开更多
G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR ...G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs is still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure were 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance.展开更多
In this paper,Edgeworth expansion for the nearest neighbor\|kernel estimate and random weighting approximation of conditional density are given and the consistency and convergence rate are proved.
Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the est...Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).展开更多
In this paper,the application of an algorithm for precipitation retrieval based on Himawari-8 (H8) satellite infrared data is studied.Based on GPM precipitation data and H8 Infrared spectrum channel brightness tempera...In this paper,the application of an algorithm for precipitation retrieval based on Himawari-8 (H8) satellite infrared data is studied.Based on GPM precipitation data and H8 Infrared spectrum channel brightness temperature data,corresponding "precipitation field dictionary" and "channel brightness temperature dictionary" are formed.The retrieval of precipitation field based on brightness temperature data is studied through the classification rule of k-nearest neighbor domain (KNN) and regularization constraint.Firstly,the corresponding "dictionary" is constructed according to the training sample database of the matched GPM precipitation data and H8 brightness temperature data.Secondly,according to the fact that precipitation characteristics in small organizations in different storm environments are often repeated,KNN is used to identify the spectral brightness temperature signal of "precipitation" and "non-precipitation" based on "the dictionary".Finally,the precipitation field retrieval is carried out in the precipitation signal "subspace" based on the regular term constraint method.In the process of retrieval,the contribution rate of brightness temperature retrieval of different channels was determined by Bayesian model averaging (BMA) model.The preliminary experimental results based on the "quantitative" evaluation indexes show that the precipitation of H8 retrieval has a good correlation with the GPM truth value,with a small error and similar structure.展开更多
文摘The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphorus (T-P) at a wastewater treatment plant (WWTP). The search range and approach for determining the number of nearest neighbors (NNs) under dry and wet weather conditions were initially optimized based on the root mean square error (RMSE). The optimum search range for considering data size was one year. The square root-based (SR) approach was superior to the distance factor-based (DF) approach in determining the appropriate number of NNs. However, the results for both approaches varied slightly depending on the water quality and the weather conditions. The influent flow rate was accurately predicted within one standard deviation of measured values. Influent water qualities were well predicted with the mean absolute percentage error (MAPE) under both wet and dry weather conditions. For the seven-day prediction, the difference in predictive accuracy was less than 5% in dry weather conditions and slightly worse in wet weather conditions. Overall, the k-NN method was verified to be useful for predicting WWTP influent characteristics.
基金supported by the Natural Science Foundation of Jiangsu Province(No.BK2004142)partly by the National Natural Science Foundation of China(No.60275007).
文摘G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs is still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure were 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance.
文摘In this paper,Edgeworth expansion for the nearest neighbor\|kernel estimate and random weighting approximation of conditional density are given and the consistency and convergence rate are proved.
文摘Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).
基金Supported by National Natural Science Foundation of China(41805080)Natural Science Foundation of Anhui Province,China(1708085QD89)+1 种基金Key Research and Development Program Projects of Anhui Province,China(201904a07020099)Open Foundation Project Shenyang Institute of Atmospheric Environment,China Meteorological Administration(2016SYIAE14)
文摘In this paper,the application of an algorithm for precipitation retrieval based on Himawari-8 (H8) satellite infrared data is studied.Based on GPM precipitation data and H8 Infrared spectrum channel brightness temperature data,corresponding "precipitation field dictionary" and "channel brightness temperature dictionary" are formed.The retrieval of precipitation field based on brightness temperature data is studied through the classification rule of k-nearest neighbor domain (KNN) and regularization constraint.Firstly,the corresponding "dictionary" is constructed according to the training sample database of the matched GPM precipitation data and H8 brightness temperature data.Secondly,according to the fact that precipitation characteristics in small organizations in different storm environments are often repeated,KNN is used to identify the spectral brightness temperature signal of "precipitation" and "non-precipitation" based on "the dictionary".Finally,the precipitation field retrieval is carried out in the precipitation signal "subspace" based on the regular term constraint method.In the process of retrieval,the contribution rate of brightness temperature retrieval of different channels was determined by Bayesian model averaging (BMA) model.The preliminary experimental results based on the "quantitative" evaluation indexes show that the precipitation of H8 retrieval has a good correlation with the GPM truth value,with a small error and similar structure.