A new parallel expectation-maximization (EM) algorithm is proposed for large databases. The purpose of the algorithm is to accelerate the operation of the EM algorithm. As a well-known algorithm for estimation in ge...A new parallel expectation-maximization (EM) algorithm is proposed for large databases. The purpose of the algorithm is to accelerate the operation of the EM algorithm. As a well-known algorithm for estimation in generic statistical problems, the EM algorithm has been widely used in many domains. But it often requires significant computational resources. So it is needed to develop more elaborate methods to adapt the databases to a large number of records or large dimensionality. The parallel EM algorithm is based on partial Esteps which has the standard convergence guarantee of EM. The algorithm utilizes fully the advantage of parallel computation. It was confirmed that the algorithm obtains about 2.6 speedups in contrast with the standard EM algorithm through its application to large databases. The running time will decrease near linearly when the number of processors increasing.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
A two-dimensional (2-D) polynomial regression model is set up to approximate the time-frequency response of slowly time-varying orthogonal frequency-division multiplexing (OFDM) systems. With this model the estima...A two-dimensional (2-D) polynomial regression model is set up to approximate the time-frequency response of slowly time-varying orthogonal frequency-division multiplexing (OFDM) systems. With this model the estimation of the OFDM time-frequency response is turned into the optimization of some time-invariant model parameters. A new algorithm based on the expectation-maximization (EM) method is proposed to obtain the maximum-likelihood (ML) estimation of the polynomial model parameters over the 2-D observed data. At the same time, in order to reduce the complexity and avoid the computation instability, a novel recursive approach (RPEMTO) is given to calculate the values of the parameters. It is further shown that this 2-D polynomial EM-based algorithm for time-varying OFDM (PEMTO) can be simplified mathematically to handle the one-dimensional sequential estimation. Simulations illustrate that the proposed algorithms achieve a lower bit error rate (BER) than other blind algorithms.展开更多
An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the ...An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation- maximization (EM) algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.展开更多
A new method that uses a modified ordered subsets (MOS) algorithm to improve the convergence rate of space-alternating generalized expectation-maximization (SAGE) algorithm for positron emission tomography (PET)...A new method that uses a modified ordered subsets (MOS) algorithm to improve the convergence rate of space-alternating generalized expectation-maximization (SAGE) algorithm for positron emission tomography (PET) image reconstruction is proposed.In the MOS-SAGE algorithm,the number of projections and the access order of the subsets are modified in order to improve the quality of the reconstructed images and accelerate the convergence speed.The number of projections in a subset increases as follows:2,4,8,16,32 and 64.This sequence means that the high frequency component is recovered first and the low frequency component is recovered in the succeeding iteration steps.In addition,the neighboring subsets are separated as much as possible so that the correlation of projections can be decreased and the convergences can be speeded up.The application of the proposed method to simulated and real images shows that the MOS-SAGE algorithm has better performance than the SAGE algorithm and the OSEM algorithm in convergence and image quality.展开更多
Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human ne...Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons.展开更多
In standard interval mapping (IM) of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. When this assumption of normality is violated, the most commonly adopted strategy is to use th...In standard interval mapping (IM) of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. When this assumption of normality is violated, the most commonly adopted strategy is to use the previous model after data transformation. However, an appropriate transformation may not exist or may be difficult to find. Also this approach can raise interpretation issues. An interesting alternative is to consider a skew-normal mixture model in standard IM, and the resulting method is here denoted as skew-normal IM. This flexible model that includes the usual symmetric normal distribution as a special case is important, allowing continuous variation from normality to non-normality. In this paper we briefly introduce the main peculiarities of the skew-normal distribution. The maximum likelihood estimates of parameters of the skew-normal distribution are obtained by the expectation-maximization (EM) algorithm. The proposed model is illustrated with real data from an intercross experiment that shows a significant departure from the normality assumption. The performance of the skew-normal IM is assessed via stochastic simulation. The results indicate that the skew-normal IM has higher power for QTL detection and better precision of QTL location as compared to standard IM and nonparametric IM.展开更多
To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed t...To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.展开更多
We propose a robust visual tracking framework based on particle filter to deal with the object appearance changes due to varying illumination, pose variantions, and occlusions. We mainly improve the observation model ...We propose a robust visual tracking framework based on particle filter to deal with the object appearance changes due to varying illumination, pose variantions, and occlusions. We mainly improve the observation model and re-sampling process in a particle filter. We use on-line updating appearance model, affine transformation, and M-estimation to construct an adaptive observation model. On-line updating appearance model can adapt to the changes of illumination partially. Affine transformation-based similarity measurement is introduced to tackle pose variantions, and M-estimation is used to handle the occluded object in computing observation likelihood. To take advantage of the most recent observation and produce a suboptimal Gaussian proposal distribution, we incorporate Kalman filter into a particle filter to enhance the performance of the resampling process. To estimate the posterior probability density properly with lower computational complexity, we only employ a single Kalman filter to propagate Gaussian distribution. Experimental results have demonstrated the effectiveness and robustness of the proposed algorithm by tracking visual objects in the recorded video sequences.展开更多
Remaining useful life(RUL)estimation approaches on the basis of the degradation data have been greatly developed,and significant advances have been witnessed.Establishing an applicable degradation model of the system ...Remaining useful life(RUL)estimation approaches on the basis of the degradation data have been greatly developed,and significant advances have been witnessed.Establishing an applicable degradation model of the system is the foundation and key to accurately estimating its RUL.Most current researches focus on age-dependent degradation models,but it has been found that some degradation processes in engineering are also related to the degradation states themselves.In addition,due to different working conditions and complex environments in engineering,the problems of the unit-to-unit variability in the degradation process of the same batch of systems and actual degradation states cannot be directly observed will affect the estimation accuracy of the RUL.In order to solve the above issues jointly,we develop an age-dependent and state-dependent nonlinear degradation model taking into consideration the unit-to-unit variability and hidden degradation states.Then,the Kalman filter(KF)is utilized to update the hidden degradation states in real time,and the expectation-maximization(EM)algorithm is applied to adaptively estimate the unknown model parameters.Besides,the approximate analytical RUL distribution can be obtained from the concept of the first hitting time.Once the new observation is available,the RUL distribution can be updated adaptively on the basis of the updated degradation states and model parameters.The effectiveness and accuracy of the proposed approach are shown by a numerical simulation and case studies for Li-ion batteries and rolling element bearings.展开更多
A channel estimation approach for orthogonal frequency division multiplexing with multiple-input and multipleoutput (MIMO-OFDM) in rapid fading channels is proposed. This approach combines the advantages of an optim...A channel estimation approach for orthogonal frequency division multiplexing with multiple-input and multipleoutput (MIMO-OFDM) in rapid fading channels is proposed. This approach combines the advantages of an optimal training sequence based least-square (OLS) algorithm and an expectation-maximization (EM) algorithm. The channels at the training blocks are estimated using an estimator based on the OLS algorithm. To compensate for the fast Rayleigh fading at the data blocks, a time domain based Gaussian interpolation filter is presented. Furthermore, an EM algorithm is introduced to improve the performance of channel estimation by a few iterations. Simulations show that this channel estimation approach can effectively track rapid channel variation.展开更多
The direction of arrival(DOA) estimation problem in the presence of sensor location errors is studied and an algorithm based on space alternating generalized expectation-maximization(SAGE) is presented. First, the nar...The direction of arrival(DOA) estimation problem in the presence of sensor location errors is studied and an algorithm based on space alternating generalized expectation-maximization(SAGE) is presented. First, the narrowband case is considered.Based on the small perturbation assumption, this paper proposes an augmentation scheme so as to estimate DOA and perturbation parameters. The E-step and M-step of the SAGE algorithm in this case are derived. Then, the algorithm is extended to the wideband case. The wideband SAGE algorithm is derived in frequency domain by jointing all frequency bins. Simulation results show that the algorithm achieves good convergence and high parameter estimation precision.展开更多
Recently, the exponential rise in communication system demands has motivated global academia-industry to develop efficient communication technologies to fulfill energy efficiency and Quality of Service (QoS) demands. ...Recently, the exponential rise in communication system demands has motivated global academia-industry to develop efficient communication technologies to fulfill energy efficiency and Quality of Service (QoS) demands. Wireless Sensor Network (WSN) being one of the most efficient technologies possesses immense potential to serve major communication purposes including civil, defense and industrial purposes etc. The inclusion of sensor-mobility with WSN has broadened application horizon. The effectiveness of WSNs can be characterized by its ability to perform efficient data gathering and transmission to the base station for decision process. Clustering based routing scheme has been one of the dominating techniques for WSN systems;however key issues like, cluster formation, selection of the number of clusters and cluster heads, and data transmission decision from sensors to the mobile sink have always been an open research area. In this paper, a robust and energy efficient single mobile sink based WSN data gathering protocol is proposed. Unlike existing approaches, an enhanced centralized clustering model is developed on the basis of expectation-maximization (EEM) concept. Further, it is strengthened by using an optimal cluster count estimation technique that ensures that the number of clusters in the network region doesn’t introduce unwanted energy exhaustion. Meanwhile, the relative distance between sensor node and cluster head as well as mobile sink is used to make transmission (path) decision. Results exhibit that the proposed EEM based clustering with optimal cluster selection and optimal dynamic transmission decision enables higher throughput, fast data gathering, minima delay and energy consumption, and higher展开更多
A new channel estimation and data detection joint algorithm is proposed for multi-input multi-output (MIMO) - orthogonal frequency division multiplexing (OFDM) system using linear minimum mean square error (LMMSE...A new channel estimation and data detection joint algorithm is proposed for multi-input multi-output (MIMO) - orthogonal frequency division multiplexing (OFDM) system using linear minimum mean square error (LMMSE)- based space-alternating generalized expectation-maximization (SAGE) algorithm. In the proposed algorithm, every sub-frame of the MIMO-OFDM system is divided into some OFDM sub-blocks and the LMMSE-based SAGE algorithm in each sub-block is used. At the head of each sub-flame, we insert training symbols which are used in the initial estimation at the beginning. Channel estimation of the previous sub-block is applied to the initial estimation in the current sub-block by the maximum-likelihood (ML) detection to update channel estimatjon and data detection by iteration until converge. Then all the sub-blocks can be finished in turn. Simulation results show that the proposed algorithm can improve the bit error rate (BER) performance.展开更多
In this paper, an evolutionary recursive Bayesian estimation algorithm is presented, which incorporates the latest observation with a new proposal distribution, and the posterior state density is represented by a Gaus...In this paper, an evolutionary recursive Bayesian estimation algorithm is presented, which incorporates the latest observation with a new proposal distribution, and the posterior state density is represented by a Gaussian mixture model that is recovered from the weighted particle set of the measurement update step by means of a weighted expectation-maximization algorithm. This step replaces the resampling stage needed by most particle filters and relieves the effect caused by sample impoverishment. A nonlinear tracking problem shows that this new approach outperforms other related particle filters.展开更多
Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distributio...Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distribution, an expectation-maximization(EM) algorithm of the hazard ratio was derived, and several simulation studies were used to verify its applications. The method had obvious variation in the hazard ratio estimates and overestimation for the relatively small hazard ratios. Our studies showed that the stability of the EM estimation results directly correlated with the sample size, the convergence of the EM algorithm was impacted by the initial values, and a balanced design produced the best estimates. No reliable blinded sample size re-estimation inference can be made in our studies, but the results provide useful information to steer the practitioners in this field from repeating the same endeavor.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
基金the National Natural Science Foundation of China(79990584)
文摘A new parallel expectation-maximization (EM) algorithm is proposed for large databases. The purpose of the algorithm is to accelerate the operation of the EM algorithm. As a well-known algorithm for estimation in generic statistical problems, the EM algorithm has been widely used in many domains. But it often requires significant computational resources. So it is needed to develop more elaborate methods to adapt the databases to a large number of records or large dimensionality. The parallel EM algorithm is based on partial Esteps which has the standard convergence guarantee of EM. The algorithm utilizes fully the advantage of parallel computation. It was confirmed that the algorithm obtains about 2.6 speedups in contrast with the standard EM algorithm through its application to large databases. The running time will decrease near linearly when the number of processors increasing.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
基金The National Natural Science Foundation of China(No60472026)
文摘A two-dimensional (2-D) polynomial regression model is set up to approximate the time-frequency response of slowly time-varying orthogonal frequency-division multiplexing (OFDM) systems. With this model the estimation of the OFDM time-frequency response is turned into the optimization of some time-invariant model parameters. A new algorithm based on the expectation-maximization (EM) method is proposed to obtain the maximum-likelihood (ML) estimation of the polynomial model parameters over the 2-D observed data. At the same time, in order to reduce the complexity and avoid the computation instability, a novel recursive approach (RPEMTO) is given to calculate the values of the parameters. It is further shown that this 2-D polynomial EM-based algorithm for time-varying OFDM (PEMTO) can be simplified mathematically to handle the one-dimensional sequential estimation. Simulations illustrate that the proposed algorithms achieve a lower bit error rate (BER) than other blind algorithms.
基金The National Natural Science Foundation of China(No.61105048,60972165)the Doctoral Fund of Ministry of Education of China(No.20110092120034)+2 种基金the Natural Science Foundation of Jiangsu Province(No.BK2010240)the Technology Foundation for Selected Overseas Chinese Scholar,Ministry of Human Resources and Social Security of China(No.6722000008)the Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control(No.YCCK201005)
文摘An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation- maximization (EM) algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.
基金The National Basic Research Program of China (973Program) (No.2003CB716102).
文摘A new method that uses a modified ordered subsets (MOS) algorithm to improve the convergence rate of space-alternating generalized expectation-maximization (SAGE) algorithm for positron emission tomography (PET) image reconstruction is proposed.In the MOS-SAGE algorithm,the number of projections and the access order of the subsets are modified in order to improve the quality of the reconstructed images and accelerate the convergence speed.The number of projections in a subset increases as follows:2,4,8,16,32 and 64.This sequence means that the high frequency component is recovered first and the low frequency component is recovered in the succeeding iteration steps.In addition,the neighboring subsets are separated as much as possible so that the correlation of projections can be decreased and the convergences can be speeded up.The application of the proposed method to simulated and real images shows that the MOS-SAGE algorithm has better performance than the SAGE algorithm and the OSEM algorithm in convergence and image quality.
基金supported by the National Natural Science Foundation of China,No.10872069
文摘Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons.
基金Project supported in part by Foundation for Science and Technology(FCT) (No.SFRD/BD/5987/2001)the Operational ProgramScience,Technology,and Innovation of the FCT,co-financed by theEuropean Regional Development Fund (ERDF)
文摘In standard interval mapping (IM) of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. When this assumption of normality is violated, the most commonly adopted strategy is to use the previous model after data transformation. However, an appropriate transformation may not exist or may be difficult to find. Also this approach can raise interpretation issues. An interesting alternative is to consider a skew-normal mixture model in standard IM, and the resulting method is here denoted as skew-normal IM. This flexible model that includes the usual symmetric normal distribution as a special case is important, allowing continuous variation from normality to non-normality. In this paper we briefly introduce the main peculiarities of the skew-normal distribution. The maximum likelihood estimates of parameters of the skew-normal distribution are obtained by the expectation-maximization (EM) algorithm. The proposed model is illustrated with real data from an intercross experiment that shows a significant departure from the normality assumption. The performance of the skew-normal IM is assessed via stochastic simulation. The results indicate that the skew-normal IM has higher power for QTL detection and better precision of QTL location as compared to standard IM and nonparametric IM.
基金Supported by the National Natural Science Foundation of China (42174142)National Science and Technology Major Project (2017ZX05039-002)+2 种基金Operation Fund of China National Petroleum Corporation Logging Key Laboratory (2021DQ20210107-11)Fundamental Research Funds for Central Universities (19CX02006A)Major Science and Technology Project of China National Petroleum Corporation (ZD2019-183-006)。
文摘To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.
基金supported by National Natural Science Foundation of China (No.40627001)the 985 Innovation Project on Information Technique of Xiamen University (2004–2008)
文摘We propose a robust visual tracking framework based on particle filter to deal with the object appearance changes due to varying illumination, pose variantions, and occlusions. We mainly improve the observation model and re-sampling process in a particle filter. We use on-line updating appearance model, affine transformation, and M-estimation to construct an adaptive observation model. On-line updating appearance model can adapt to the changes of illumination partially. Affine transformation-based similarity measurement is introduced to tackle pose variantions, and M-estimation is used to handle the occluded object in computing observation likelihood. To take advantage of the most recent observation and produce a suboptimal Gaussian proposal distribution, we incorporate Kalman filter into a particle filter to enhance the performance of the resampling process. To estimate the posterior probability density properly with lower computational complexity, we only employ a single Kalman filter to propagate Gaussian distribution. Experimental results have demonstrated the effectiveness and robustness of the proposed algorithm by tracking visual objects in the recorded video sequences.
基金supported by the National Key R&D Program of China(2018YFB1306100)the National Natural Science Foundation of China(61922089,61833016,62073336,61903376,61773386)the National Science Foundation of Shannxi Province(2020JQ-489,2020JM-360).
文摘Remaining useful life(RUL)estimation approaches on the basis of the degradation data have been greatly developed,and significant advances have been witnessed.Establishing an applicable degradation model of the system is the foundation and key to accurately estimating its RUL.Most current researches focus on age-dependent degradation models,but it has been found that some degradation processes in engineering are also related to the degradation states themselves.In addition,due to different working conditions and complex environments in engineering,the problems of the unit-to-unit variability in the degradation process of the same batch of systems and actual degradation states cannot be directly observed will affect the estimation accuracy of the RUL.In order to solve the above issues jointly,we develop an age-dependent and state-dependent nonlinear degradation model taking into consideration the unit-to-unit variability and hidden degradation states.Then,the Kalman filter(KF)is utilized to update the hidden degradation states in real time,and the expectation-maximization(EM)algorithm is applied to adaptively estimate the unknown model parameters.Besides,the approximate analytical RUL distribution can be obtained from the concept of the first hitting time.Once the new observation is available,the RUL distribution can be updated adaptively on the basis of the updated degradation states and model parameters.The effectiveness and accuracy of the proposed approach are shown by a numerical simulation and case studies for Li-ion batteries and rolling element bearings.
基金Project supported by the National High-Technology Research and Development Program of China (Grant No. 2003AA123- 31007), and the National Natural Science Foundation of China (Grant No.60272079)
文摘A channel estimation approach for orthogonal frequency division multiplexing with multiple-input and multipleoutput (MIMO-OFDM) in rapid fading channels is proposed. This approach combines the advantages of an optimal training sequence based least-square (OLS) algorithm and an expectation-maximization (EM) algorithm. The channels at the training blocks are estimated using an estimator based on the OLS algorithm. To compensate for the fast Rayleigh fading at the data blocks, a time domain based Gaussian interpolation filter is presented. Furthermore, an EM algorithm is introduced to improve the performance of channel estimation by a few iterations. Simulations show that this channel estimation approach can effectively track rapid channel variation.
文摘The direction of arrival(DOA) estimation problem in the presence of sensor location errors is studied and an algorithm based on space alternating generalized expectation-maximization(SAGE) is presented. First, the narrowband case is considered.Based on the small perturbation assumption, this paper proposes an augmentation scheme so as to estimate DOA and perturbation parameters. The E-step and M-step of the SAGE algorithm in this case are derived. Then, the algorithm is extended to the wideband case. The wideband SAGE algorithm is derived in frequency domain by jointing all frequency bins. Simulation results show that the algorithm achieves good convergence and high parameter estimation precision.
文摘Recently, the exponential rise in communication system demands has motivated global academia-industry to develop efficient communication technologies to fulfill energy efficiency and Quality of Service (QoS) demands. Wireless Sensor Network (WSN) being one of the most efficient technologies possesses immense potential to serve major communication purposes including civil, defense and industrial purposes etc. The inclusion of sensor-mobility with WSN has broadened application horizon. The effectiveness of WSNs can be characterized by its ability to perform efficient data gathering and transmission to the base station for decision process. Clustering based routing scheme has been one of the dominating techniques for WSN systems;however key issues like, cluster formation, selection of the number of clusters and cluster heads, and data transmission decision from sensors to the mobile sink have always been an open research area. In this paper, a robust and energy efficient single mobile sink based WSN data gathering protocol is proposed. Unlike existing approaches, an enhanced centralized clustering model is developed on the basis of expectation-maximization (EEM) concept. Further, it is strengthened by using an optimal cluster count estimation technique that ensures that the number of clusters in the network region doesn’t introduce unwanted energy exhaustion. Meanwhile, the relative distance between sensor node and cluster head as well as mobile sink is used to make transmission (path) decision. Results exhibit that the proposed EEM based clustering with optimal cluster selection and optimal dynamic transmission decision enables higher throughput, fast data gathering, minima delay and energy consumption, and higher
基金Supported by the National Natural Science Foundation of China (No. 61001105), the National Science and Technology Major Projects (No. 2011ZX03001- 007- 03) and Beijing Natural Science Foundation (No. 4102043).
文摘A new channel estimation and data detection joint algorithm is proposed for multi-input multi-output (MIMO) - orthogonal frequency division multiplexing (OFDM) system using linear minimum mean square error (LMMSE)- based space-alternating generalized expectation-maximization (SAGE) algorithm. In the proposed algorithm, every sub-frame of the MIMO-OFDM system is divided into some OFDM sub-blocks and the LMMSE-based SAGE algorithm in each sub-block is used. At the head of each sub-flame, we insert training symbols which are used in the initial estimation at the beginning. Channel estimation of the previous sub-block is applied to the initial estimation in the current sub-block by the maximum-likelihood (ML) detection to update channel estimatjon and data detection by iteration until converge. Then all the sub-blocks can be finished in turn. Simulation results show that the proposed algorithm can improve the bit error rate (BER) performance.
基金Sponsored by the National Security Major Basic Research Project of China(Grant No.973 -61334)
文摘In this paper, an evolutionary recursive Bayesian estimation algorithm is presented, which incorporates the latest observation with a new proposal distribution, and the posterior state density is represented by a Gaussian mixture model that is recovered from the weighted particle set of the measurement update step by means of a weighted expectation-maximization algorithm. This step replaces the resampling stage needed by most particle filters and relieves the effect caused by sample impoverishment. A nonlinear tracking problem shows that this new approach outperforms other related particle filters.
基金supported by the National Natural Science Foundation of China(81273184)the National Natural Science Foundation of China Grant for Young Scientists (81302512)
文摘Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distribution, an expectation-maximization(EM) algorithm of the hazard ratio was derived, and several simulation studies were used to verify its applications. The method had obvious variation in the hazard ratio estimates and overestimation for the relatively small hazard ratios. Our studies showed that the stability of the EM estimation results directly correlated with the sample size, the convergence of the EM algorithm was impacted by the initial values, and a balanced design produced the best estimates. No reliable blinded sample size re-estimation inference can be made in our studies, but the results provide useful information to steer the practitioners in this field from repeating the same endeavor.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.