The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the unc...The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.展开更多
A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The b...A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.展开更多
Reconfigurable intelligent surface(RIS)employs passive beamforming to control the wireless propagation channel,which benefits the wireless communication capacity and the received energy efficiency of wireless power tr...Reconfigurable intelligent surface(RIS)employs passive beamforming to control the wireless propagation channel,which benefits the wireless communication capacity and the received energy efficiency of wireless power transfer(WPT)systems.Such beamforming schemes are classified as discrete and non-convex integer program-ming problems.In this paper,we propose a Monte-Carlo(MC)based random energy passive beamforming of RIS to achieve the maximum received power of electromagnetic(EM)WPT systems.Generally,the Gibbs sampling and re-sampling methods are employed to generate phase shift vector samples.And the sample with the maximum received power is considered the optimal solution.In order to adapt to the application scenarios,we develop two types of passive beamforming algorithms based on such MC sampling methods.The first passive beamforming uses an approximation of the integer programming as the initial sample,which is calculated based on the channel information.And the second one is a purely randomized algorithm with the only total received power feedback.The proposed methods present several advantages for RIS control,e.g.,fast convergence,easy implementation,robustness to the channel noise,and limited feedback requirement,and they are applicable even if the channel information is unknown.According to the simulation results,our proposed methods outperform other approxi-mation and genetic algorithms.With our methods,the WPT system even significantly improves the power effi-ciency in the nonline-of-sight(NLOS)environment.展开更多
Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domai...Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.展开更多
The estimation of sparse underwater acoustic(UWA)channels can be regarded as an inference problem involving hidden variables within the Bayesian framework.While the classical sparse Bayesian learning(SBL),derived thro...The estimation of sparse underwater acoustic(UWA)channels can be regarded as an inference problem involving hidden variables within the Bayesian framework.While the classical sparse Bayesian learning(SBL),derived through the expectation maximization(EM)algorithm,has been widely employed for UWA channel estimation,it still differs from the real posterior expectation of channels.In this paper,we propose an approach that combines variational inference(VI)and Markov chain Monte Carlo(MCMC)methods to provide a more accurate posterior estimation.Specifically,the SBL is first re-derived with VI,allowing us to replace the posterior distribution of the hidden variables with a variational distribution.Then,we determine the full conditional probability distribution for each variable in the variational distribution and then iteratively perform random Gibbs sampling in MCMC to converge the Markov chain.The results of simulation and experiment indicate that our estimation method achieves lower mean square error and bit error rate compared to the classic SBL approach.Additionally,it demonstrates an acceptable convergence speed.展开更多
本文从金融科技大数据出发,以人工智能的吉布斯随机搜索(Gibbs Sampling)算法为工具,在大数据框架下建立了针对公司财务欺诈风险的特征因子筛选的一般处理方法与特征提取推断原理,并结合上市公司的财务报表数据进行实证分析,结合从2017...本文从金融科技大数据出发,以人工智能的吉布斯随机搜索(Gibbs Sampling)算法为工具,在大数据框架下建立了针对公司财务欺诈风险的特征因子筛选的一般处理方法与特征提取推断原理,并结合上市公司的财务报表数据进行实证分析,结合从2017年1月到2018年12月证监会对上市公司财务报表信息披露违规的数据样本,筛选出刻画财务欺诈的特征因子并进行了验证测试,支持财务欺诈的识别。本文提出的框架和模型方法可以加强和提升对上市公司财务欺诈风险的识别能力,并实现对公司财务在欺诈方面的探测与预测(Detecting and Predicting)功能。展开更多
In practice, the failure rate of most equipment exhibits different tendencies at different stages and even its failure rate curve behaves a multimodal trace during its life cycle. As a result,traditionally evaluating ...In practice, the failure rate of most equipment exhibits different tendencies at different stages and even its failure rate curve behaves a multimodal trace during its life cycle. As a result,traditionally evaluating the reliability of equipment with a single model may lead to severer errors.However, if lifetime is divided into several different intervals according to the characteristics of its failure rate, piecewise fitting can more accurately approximate the failure rate of equipment. Therefore, in this paper, failure rate is regarded as a piecewise function, and two kinds of segmented distribution are put forward to evaluate reliability. In order to estimate parameters in the segmented reliability function, Bayesian estimation and maximum likelihood estimation(MLE) of the segmented distribution are discussed in this paper. Since traditional information criterion is not suitable for the segmented distribution, an improved information criterion is proposed to test and evaluate the segmented reliability model in this paper. After a great deal of testing and verification,the segmented reliability model and its estimation methods presented in this paper are proven more efficient and accurate than the traditional non-segmented single model, especially when the change of the failure rate is time-phased or multimodal. The significant performance of the segmented reliability model in evaluating reliability of proximity sensors of leading-edge flap in civil aircraft indicates that the segmented distribution and its estimation method in this paper could be useful and accurate.展开更多
In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation ...In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation study based on proposed sampling algorithm is conductedto assess the performance of the proposed estimation for various sample sizes. Finally, two realdata-sets are analysed to illustrate the practicability of the proposed method.展开更多
Most studies of series system assume the causes of failure are independent,which may not hold in practice.In this paper,dependent causes of failure are considered by using a Marshall-Olkin bivariateWeibull distributio...Most studies of series system assume the causes of failure are independent,which may not hold in practice.In this paper,dependent causes of failure are considered by using a Marshall-Olkin bivariateWeibull distribution.We derived four reference priors based on several grouping orders.Gibbs sampling combined with the rejection sampling algorithm and Metropolis-Hastings algorithm is developed to obtain the estimates of the unknown parameters.The proposed approach is compared with the maximum-likelihood method via simulation.We find that the root-meansquared errors of the Bayesian estimates are much smaller for the case of small sample size,and that the coverage probabilities of the Bayesian estimates are much closer to the nominal levels.Finally,a real data-set is analysed for illustration.展开更多
In this work,we consider the problem of estimating the parameters and predicting the unobserved or removed ordered data for the progressive type II censored flexible Weibull sample.Frequentist and Bayesian analyses ar...In this work,we consider the problem of estimating the parameters and predicting the unobserved or removed ordered data for the progressive type II censored flexible Weibull sample.Frequentist and Bayesian analyses are adopted for conducting the estimation and prediction problems.The likelihood method as well as the Bayesian sampling techniques is applied for the inference problems.The point predictors and credible intervals of unobserved data based on an informative set of data are computed.Markov ChainMonte Carlo samples are performed to compare the so-obtained methods,and one real data set is analyzed for illustrative purposes.展开更多
Bayesian Hierarchical models has been widely used in modern statistical application.To deal with the data having complex structures,we propose a generalized hierarchical normal linear(GHNL)model which accommodates arb...Bayesian Hierarchical models has been widely used in modern statistical application.To deal with the data having complex structures,we propose a generalized hierarchical normal linear(GHNL)model which accommodates arbitrarily many levels,usual design matrices and'vanilla'covari-ance matrices.Objective hyperpriors can be employed for the GHNL model to express ignorance or match frequentist properties,yet the common objective Bayesian approaches are infeasible or fraught with danger in hierarchical modelling.To tackle this issue,[Berger,J,Sun,D.&Song,C.(2020b).An objective prior for hyperparameters in normal hierarchical models.Journal of Multi-variate Analysis,178,104606.https://doi.org/10.1016/jmva.2020.104606]proposed a particular objective prior and investigated its properties comprehensively.Posterior propriety is important for the choice of priors to guarantee the convergence of MCMC samplers.James Berger conjec-tured that the resulting posterior is proper for a hierarchical normal model with arbitrarily many levels,a rigorous proof of which was not given,however.In this paper,we complete this story and provide an user friendly guidance.One main contribution of this paper is to propose a new technique for deriving an elaborate upper bound on the integrated likelihood but also one uni-fied approach to checking the posterior propriety for linear models.An eficient Gibbs sampling method is also introduced and outperforms other sampling approaches considerably.展开更多
文摘The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.
基金support from the National Science,Research and Innovation Fund (NSRF)King Mongkut’s University of Technology North Bangkok (Grant No.KMUTNB-FF-65-22).
文摘A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.
基金supported by National Nature Science Foundation of China(No.62171484)Zhuhai Fundamental and Application Research(No.ZH22017003210006PWC)Fundamental Research Funds for the Central Universities(No.21621420).
文摘Reconfigurable intelligent surface(RIS)employs passive beamforming to control the wireless propagation channel,which benefits the wireless communication capacity and the received energy efficiency of wireless power transfer(WPT)systems.Such beamforming schemes are classified as discrete and non-convex integer program-ming problems.In this paper,we propose a Monte-Carlo(MC)based random energy passive beamforming of RIS to achieve the maximum received power of electromagnetic(EM)WPT systems.Generally,the Gibbs sampling and re-sampling methods are employed to generate phase shift vector samples.And the sample with the maximum received power is considered the optimal solution.In order to adapt to the application scenarios,we develop two types of passive beamforming algorithms based on such MC sampling methods.The first passive beamforming uses an approximation of the integer programming as the initial sample,which is calculated based on the channel information.And the second one is a purely randomized algorithm with the only total received power feedback.The proposed methods present several advantages for RIS control,e.g.,fast convergence,easy implementation,robustness to the channel noise,and limited feedback requirement,and they are applicable even if the channel information is unknown.According to the simulation results,our proposed methods outperform other approxi-mation and genetic algorithms.With our methods,the WPT system even significantly improves the power effi-ciency in the nonline-of-sight(NLOS)environment.
基金This work was supported by the Project of Industry and University Cooperative Research of Jiangsu Province,China(No.BY2019051)Ma,J.would like to thank the Jiangsu Eazytec Information Technology Company(www.eazytec.com)for their financial support.
文摘Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.
基金funded by the Excellent Youth Science Fund of Heilongjiang Province(Grant No.YQ2022F001).
文摘The estimation of sparse underwater acoustic(UWA)channels can be regarded as an inference problem involving hidden variables within the Bayesian framework.While the classical sparse Bayesian learning(SBL),derived through the expectation maximization(EM)algorithm,has been widely employed for UWA channel estimation,it still differs from the real posterior expectation of channels.In this paper,we propose an approach that combines variational inference(VI)and Markov chain Monte Carlo(MCMC)methods to provide a more accurate posterior estimation.Specifically,the SBL is first re-derived with VI,allowing us to replace the posterior distribution of the hidden variables with a variational distribution.Then,we determine the full conditional probability distribution for each variable in the variational distribution and then iteratively perform random Gibbs sampling in MCMC to converge the Markov chain.The results of simulation and experiment indicate that our estimation method achieves lower mean square error and bit error rate compared to the classic SBL approach.Additionally,it demonstrates an acceptable convergence speed.
文摘本文从金融科技大数据出发,以人工智能的吉布斯随机搜索(Gibbs Sampling)算法为工具,在大数据框架下建立了针对公司财务欺诈风险的特征因子筛选的一般处理方法与特征提取推断原理,并结合上市公司的财务报表数据进行实证分析,结合从2017年1月到2018年12月证监会对上市公司财务报表信息披露违规的数据样本,筛选出刻画财务欺诈的特征因子并进行了验证测试,支持财务欺诈的识别。本文提出的框架和模型方法可以加强和提升对上市公司财务欺诈风险的识别能力,并实现对公司财务在欺诈方面的探测与预测(Detecting and Predicting)功能。
基金supported by the National Natural Science Foundation of China (Nos. 60672164, 60939003, 61079013, 60879001, 90000871)the Special Project about Humanities and Social Sciences in Ministry of Education of China (No. 16JDGC008)+2 种基金National Natural Science Funds and Civil Aviation Mutual Funds (Nos. U1533128 and U1233114)Study On Reusing Sketch User Interface Oriented Design Knowledge (No. 16KJA520003)Six Talent Peaks Project In Jiangsu Province (No. 2016-XYDXXJS-088)
文摘In practice, the failure rate of most equipment exhibits different tendencies at different stages and even its failure rate curve behaves a multimodal trace during its life cycle. As a result,traditionally evaluating the reliability of equipment with a single model may lead to severer errors.However, if lifetime is divided into several different intervals according to the characteristics of its failure rate, piecewise fitting can more accurately approximate the failure rate of equipment. Therefore, in this paper, failure rate is regarded as a piecewise function, and two kinds of segmented distribution are put forward to evaluate reliability. In order to estimate parameters in the segmented reliability function, Bayesian estimation and maximum likelihood estimation(MLE) of the segmented distribution are discussed in this paper. Since traditional information criterion is not suitable for the segmented distribution, an improved information criterion is proposed to test and evaluate the segmented reliability model in this paper. After a great deal of testing and verification,the segmented reliability model and its estimation methods presented in this paper are proven more efficient and accurate than the traditional non-segmented single model, especially when the change of the failure rate is time-phased or multimodal. The significant performance of the segmented reliability model in evaluating reliability of proximity sensors of leading-edge flap in civil aircraft indicates that the segmented distribution and its estimation method in this paper could be useful and accurate.
基金The research is supported by the Natural Science Foundation of China(Nos.11271136,81530086,11671303,11201345)the 111 Project of China(No.B14019)+5 种基金the Natural Science Foundation of Zhejiang Province(No.LY15G010006)the China Postdoctoral Science Foundation(No.2015M572598)National Natural Science Foundation of China(CN)[grant number 11671303],[grant number 11201345]:Ministry of Education of the People’s Republic of China(CN)[grant number B14019]China Postdoctoral Science Foundation(CN)[grant number 2015M572598]National Natural Science Foundation of China(CN)[grant number 11271136],[grant number 81530086]Natural Science Foundation of Zhejiang Province(CN)[grant number LY15G010006].
文摘In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation study based on proposed sampling algorithm is conductedto assess the performance of the proposed estimation for various sample sizes. Finally, two realdata-sets are analysed to illustrate the practicability of the proposed method.
基金The research is supported by Natural Science Foundation of China[grant number 11671303],[grant number 11201345]Natural Science Foundation of Zhejiang province[grant number LY15G010006]China Postdoctoral Science Founda-tion[grant number 2015M572598].
文摘Most studies of series system assume the causes of failure are independent,which may not hold in practice.In this paper,dependent causes of failure are considered by using a Marshall-Olkin bivariateWeibull distribution.We derived four reference priors based on several grouping orders.Gibbs sampling combined with the rejection sampling algorithm and Metropolis-Hastings algorithm is developed to obtain the estimates of the unknown parameters.The proposed approach is compared with the maximum-likelihood method via simulation.We find that the root-meansquared errors of the Bayesian estimates are much smaller for the case of small sample size,and that the coverage probabilities of the Bayesian estimates are much closer to the nominal levels.Finally,a real data-set is analysed for illustration.
基金Supported by the Natural Science Foundation of China(11401341,11271136 and 81530086)111 Project(B14019)+2 种基金Natural Science Foundation of Fujian Province,China(2015J05014,2016J01681 and 2017N0029)Scientific Research Training Program of Fujian Province University for Distinguished Young Scholar(2015)New Century Excellent Talents Support Project of Fujian Province University([2016]23)
文摘In this work,we consider the problem of estimating the parameters and predicting the unobserved or removed ordered data for the progressive type II censored flexible Weibull sample.Frequentist and Bayesian analyses are adopted for conducting the estimation and prediction problems.The likelihood method as well as the Bayesian sampling techniques is applied for the inference problems.The point predictors and credible intervals of unobserved data based on an informative set of data are computed.Markov ChainMonte Carlo samples are performed to compare the so-obtained methods,and one real data set is analyzed for illustrative purposes.
基金The research was supported by the National Natural Science Foundation of China[grant number 11671146].
文摘Bayesian Hierarchical models has been widely used in modern statistical application.To deal with the data having complex structures,we propose a generalized hierarchical normal linear(GHNL)model which accommodates arbitrarily many levels,usual design matrices and'vanilla'covari-ance matrices.Objective hyperpriors can be employed for the GHNL model to express ignorance or match frequentist properties,yet the common objective Bayesian approaches are infeasible or fraught with danger in hierarchical modelling.To tackle this issue,[Berger,J,Sun,D.&Song,C.(2020b).An objective prior for hyperparameters in normal hierarchical models.Journal of Multi-variate Analysis,178,104606.https://doi.org/10.1016/jmva.2020.104606]proposed a particular objective prior and investigated its properties comprehensively.Posterior propriety is important for the choice of priors to guarantee the convergence of MCMC samplers.James Berger conjec-tured that the resulting posterior is proper for a hierarchical normal model with arbitrarily many levels,a rigorous proof of which was not given,however.In this paper,we complete this story and provide an user friendly guidance.One main contribution of this paper is to propose a new technique for deriving an elaborate upper bound on the integrated likelihood but also one uni-fied approach to checking the posterior propriety for linear models.An eficient Gibbs sampling method is also introduced and outperforms other sampling approaches considerably.