Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signal...Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signals, and the novel NG measure is given by Cook's Euclidean distance using the Chebyshev-Hermite series expansion. Then, a novel blind source separation (BSS) algorithm for linear mixed signals is proposed using Cook's NG measure, which makes it possible to separate statistically dependent source signals. Moreover, the proposed separation algorithm can result in the famous FastICA algorithm. Simulation results show that the proposed separation algorithm is able to separate the dependent signals and yield ideal展开更多
In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood e...In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood estimation is considered. Three diagnostic statistics are used to detect whether the outliers exist in the data set. Simulation results show that when the sample size is small, the values of diagnostic statistics based on the maximum Lq-likelihood estimation are greater than the values based on the maximum likelihood estimation. As the sample size increases, the difference between the values of the diagnostic statistics based on two estimation methods diminishes gradually. It means that the outliers can be distinguished easier through the maximum Lq-likelihood method than those through the maximum likelihood estimation method.展开更多
In this paper,a unified diagnostic method for the nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991 is presented.It is shown that the case deletion model is equivalent to t...In this paper,a unified diagnostic method for the nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991 is presented.It is shown that the case deletion model is equivalent to the mean shift outlier model.From this point of view,several diagnostic measures,such as Cook distance,score statistics are derived.The local influence measure of Cook is also presented. A numerical example illustrates that the method is available.展开更多
This paper presents a unified diagnostic method for exponential nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991. The authors show that the case deletion model is equivale...This paper presents a unified diagnostic method for exponential nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991. The authors show that the case deletion model is equivalent to mean shift outlier model. From this point of view, several diagnostic measures, such as Cook distance, score statistics are derived. The local influence measure of Cook is also presented. Numerical example illustrates that our method is available.展开更多
This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s in...This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.展开更多
Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique develope...Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.展开更多
Outlier mining is an important aspect in data mining and the outlier miningbased on Cook distance is most commonly used. But we know that when the data have multicollinearity,the traditional Cook method is no longer e...Outlier mining is an important aspect in data mining and the outlier miningbased on Cook distance is most commonly used. But we know that when the data have multicollinearity,the traditional Cook method is no longer effective. Considering the excellence of the principalcomponent estimation, we use it to substitute the least squares estimation, and then give the Cookdistance measurement based on principal component estimation, which can be used in outlier mining.At the same time, we have done some research on related theories and application problems.展开更多
In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data...In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.展开更多
When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefo...When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefore, it is necessary for data analysts to identify these influential observations and assess their impact on various aspects of model fitting. In this paper, one type of modified Cook's distances is defined to gauge the influence of one or a set observations on the estimate of the constant coefficient part in partially varying- coefficient models, and the Cook's distances are expressed as functions of the corresponding residuals and leverages. Meanwhile, a bootstrap procedure is suggested to derive the reference values for the proposed Cook's distances. Some simulations are conducted, and a real-world data set is further analyzed to examine the performance of the proposed method. The experimental results are satisfactory.展开更多
基金The National Natural Science Foundation of China (No.60672049)the Science Foundation of Henan University of Technolo-gy(No.06XJC032)
文摘Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signals, and the novel NG measure is given by Cook's Euclidean distance using the Chebyshev-Hermite series expansion. Then, a novel blind source separation (BSS) algorithm for linear mixed signals is proposed using Cook's NG measure, which makes it possible to separate statistically dependent source signals. Moreover, the proposed separation algorithm can result in the famous FastICA algorithm. Simulation results show that the proposed separation algorithm is able to separate the dependent signals and yield ideal
基金The National Natural Science Foundation of China(No.11171065)the Natural Science Foundation of Jiangsu Province(No.BK2011058)
文摘In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood estimation is considered. Three diagnostic statistics are used to detect whether the outliers exist in the data set. Simulation results show that when the sample size is small, the values of diagnostic statistics based on the maximum Lq-likelihood estimation are greater than the values based on the maximum likelihood estimation. As the sample size increases, the difference between the values of the diagnostic statistics based on two estimation methods diminishes gradually. It means that the outliers can be distinguished easier through the maximum Lq-likelihood method than those through the maximum likelihood estimation method.
基金The research project supported by NSFC(1 9631 0 4 0 ) and NSFJ
文摘In this paper,a unified diagnostic method for the nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991 is presented.It is shown that the case deletion model is equivalent to the mean shift outlier model.From this point of view,several diagnostic measures,such as Cook distance,score statistics are derived.The local influence measure of Cook is also presented. A numerical example illustrates that the method is available.
文摘This paper presents a unified diagnostic method for exponential nonlinear models with random effects based upon the joint likelihood given by Robinson in 1991. The authors show that the case deletion model is equivalent to mean shift outlier model. From this point of view, several diagnostic measures, such as Cook distance, score statistics are derived. The local influence measure of Cook is also presented. Numerical example illustrates that our method is available.
文摘This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.
文摘Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.
文摘Outlier mining is an important aspect in data mining and the outlier miningbased on Cook distance is most commonly used. But we know that when the data have multicollinearity,the traditional Cook method is no longer effective. Considering the excellence of the principalcomponent estimation, we use it to substitute the least squares estimation, and then give the Cookdistance measurement based on principal component estimation, which can be used in outlier mining.At the same time, we have done some research on related theories and application problems.
基金partly supported by the Major Project of the National Social Science Foundation of China under Grant No.18VZL006the National Natural Science Foundation of China under Grant Nos.71571126and 71974139+6 种基金the Excellent Youth Foundation of Sichuan Province under Grant No.20JCQN0225the Tianfu Ten-thousand Talents Program of Sichuan Provincethe Excellent Youth Foundation of Sichuan University under Grant No.sksyl201709the Leading Cultivation Talents Program of Sichuan Universitythe Teacher and Student Joint Innovation Project of Business School of Sichuan University under Grant No.LH2018011the2018 Special Project for Cultivation and Innovation of New AcademicQian Platform Talent under Grant No.5772-012。
文摘In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.
基金the National Natural Science Foundations of China(No.10531030,No.60675013)
文摘When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefore, it is necessary for data analysts to identify these influential observations and assess their impact on various aspects of model fitting. In this paper, one type of modified Cook's distances is defined to gauge the influence of one or a set observations on the estimate of the constant coefficient part in partially varying- coefficient models, and the Cook's distances are expressed as functions of the corresponding residuals and leverages. Meanwhile, a bootstrap procedure is suggested to derive the reference values for the proposed Cook's distances. Some simulations are conducted, and a real-world data set is further analyzed to examine the performance of the proposed method. The experimental results are satisfactory.