Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-ou...Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.展开更多
The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal...The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal component analysis using this method.Almost sure asymptotic consistency and rates of convergence for the estimators of eigenvalues and eigenfunctions have been established.We also provide the convergence rate of the variance estimation of the measurement error.Based on the results,the number of observations within each curve can be of any rate relative to the sample size,which is consistent with the earlier conclusions about the asymptotic properties of the mean and covariance estimators.展开更多
The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performan...The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.展开更多
In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linea...In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.展开更多
We propose two simple regression models of Pearson correlation coefficient of two normal responses or binary responses to assess the effect of covariates of interest.Likelihood-based inference is established to estima...We propose two simple regression models of Pearson correlation coefficient of two normal responses or binary responses to assess the effect of covariates of interest.Likelihood-based inference is established to estimate the regression coefficients,upon which bootstrap-based method is used to test the significance of covariates of interest.Simulation studies show the effectiveness of the method in terms of type-I error control,power performance in moderate sample size and robustness with respect to model mis-specification.We illustrate the application of the proposed method to some real data concerning health measurements.展开更多
Different psychiatric disorders share genetic relationships and pleiotropic loci to certain extent.We integrated and analyzed datasets related to major depressive disorder(MDD),bipolar disorder(BIP),and schizophrenia(...Different psychiatric disorders share genetic relationships and pleiotropic loci to certain extent.We integrated and analyzed datasets related to major depressive disorder(MDD),bipolar disorder(BIP),and schizophrenia(SCZ)from the Psychiatric Genomics Consortium using multitrait analysis of genome-wide association analysis(MTAG).MTAG significantly increased the effective sample size from 99,773 to 119,754 for MDD,from 909,061 to 1,450,972 for BIP,and from 856,677 to 940,613 for SCZ.We discovered 7,32,and 43 novel lead single nucleotide polymorphisms(SNPs)and 1,6,and 3 novel causal SNPs for MDD,BIP,and SCZ,respectively,after fine-mapping.We identified rs8039305 in the FURIN gene as a novel pleiotropic locus across the three disorders.We performed marker analysis of genomic annotation(MAGMA)and Hi-C-coupled MAGMA(H-MAGMA)based gene-set analysis and identified 101 genes associated with the three disorders,which were enriched in the regulation of postsynaptic membranes,postsynaptic membrane dopaminergic synapses,and Notch signaling pathway.Next,we performed Mendelian randomization analysis using different tools and detected a causal effect of BIP on SCZ.Overall,we demonstrated the usage of combined genome-wide association studies summary statistics for exploring potential novel mechanisms of the three psychiatric disorders,providing an alternative approach to integrate publicly available summary data.展开更多
The authors propose a two-step test for the two-sample problem of processes of OrnsteinUhlenbeck type. In the first step, the authors test the equality of correlation structures, based on the least square estimators o...The authors propose a two-step test for the two-sample problem of processes of OrnsteinUhlenbeck type. In the first step, the authors test the equality of correlation structures, based on the least square estimators of the correlation parameters, and the test statistic follows the standard normal distribution. If the null hypothesis is not rejected in the first step, the authors consider a second step to test the equality of marginal distributions, based on the weighted deviation of the empirical characteristic functions;the test statistic has a complicated asymptotic distribution, so that sequential bootstrap method is applied to reach a temporary decision. Simulation studies and real data analysis suggest that the proposed approach performs well in finite samples.展开更多
Firstly,this paper proposes a generalized log-determinant optimization model with the purpose of estimating the high-dimensional sparse inverse covariance matrices.Under the normality assumption,the zero components in...Firstly,this paper proposes a generalized log-determinant optimization model with the purpose of estimating the high-dimensional sparse inverse covariance matrices.Under the normality assumption,the zero components in the inverse covariance matrices represent the conditional independence between pairs of variables given all the other variables.The generalized model considered in this study,because of the setting of the eigenvalue bounded constraints,covers a large number of existing estimators as special cases.Secondly,rather than directly tracking the challenging optimization problem,this paper uses a couple of alternating direction methods of multipliers(ADMM)to solve its dual model where 5 separable structures are contained.The first implemented algorithm is based on a single Gauss–Seidel iteration,but it does not necessarily converge theoretically.In contrast,the second algorithm employs the symmetric Gauss–Seidel(sGS)based ADMM which is equivalent to the 2-block iterative scheme from the latest sGS decomposition theorem.Finally,we do numerical simulations using the synthetic data and the real data set which show that both algorithms are very effective in estimating high-dimensional sparse inverse covariance matrix.展开更多
This paper is devoted to study the proportional reinsurance/new business and investment problem under the mean-variance criterion in a continuous-time setting.The strategies are constrained in the non-negative cone an...This paper is devoted to study the proportional reinsurance/new business and investment problem under the mean-variance criterion in a continuous-time setting.The strategies are constrained in the non-negative cone and all coefficients in the model except the interest rate are stochastic processes adapted the filtration generated by a Markov chain.With the help of a backward stochastic differential equation driven by the Markov chain,we obtain the optimal strategy and optimal cost explicitly under this non-Markovian regime-switching model.The cases with one risky asset and Markov regime-switching model are considered as special cases.展开更多
We propose two variable selection methods in multivariate linear regression with highdimensional covariates.The first method uses a multiple correlation coefficient to fast reduce the dimension of the relevant predict...We propose two variable selection methods in multivariate linear regression with highdimensional covariates.The first method uses a multiple correlation coefficient to fast reduce the dimension of the relevant predictors to a moderate or low level.The second method extends the univariate forward regression of Wang[(2009).Forward regression for ultra-high dimensional variable screening.Journal of the American Statistical Association,104(488),1512–1524.https://doi.org/10.1198/jasa.2008.tm08516]in a unified way such that the variable selection and model estimation can be obtained simultaneously.We establish the sure screening property for both methods.Simulation and real data applications are presented to show the finite sample performance of the proposed methods in comparison with some naive method.展开更多
Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properti...Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properties such as admissibility in estimating θ with unknownτare not well studied.Han[(2009).Topics in shrinkage estimation and in causal inference(PhD thesis).Warton School,University of Pennsylvania]appears to be the first to consider the problem,developing sufficient conditions for the admissibility of estimating means of multivariate normal distributions with unknown variance.We generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model.2-level and 3-level hierarchical models with unknown precisionτare investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss.One reason to consider this problem is the importance of admissibility in the hierarchical prior selection,and we expect that our study could be helpful in providing some reference for choosing hierarchical priors.展开更多
The mixture cure model is the most popular model used to analyse the major event with a potential cure fraction.But in the real world there may exist a potential risk from other non-curable competing events.In this pa...The mixture cure model is the most popular model used to analyse the major event with a potential cure fraction.But in the real world there may exist a potential risk from other non-curable competing events.In this paper,we study the accelerated failure time model with mixture cure model via kernel-based nonparametric maximum likelihood estimation allowing non-curable competing risk.An EM algorithm is developed to calculate the estimates for both the regression parameters and the unknown error densities,in which a kernel-smoothed conditional profile likelihood is maximised in the M-step,and the resulting estimates are consistent.Its performance is demonstrated through comprehensive simulation studies.Finally,the proposed method is applied to the colorectal clinical trial data.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.11771146,11831008,81530086,11771145,11871252the 111 Project(B14019)Program of Shanghai Subject Chief Scientist under Grant No.14XD1401600。
文摘Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.
基金supported by National Natural Science Foundation of China(project number:11771146,11831008,81530086,11771145)the National Social Science Foundation Key Program(17ZDA091)+2 种基金the 111 Project(B14019)Programof Shanghai Subject Chief Scientist(14XD1401600)supported by the China Postdoctoral Science Foundation(2018M630393).
文摘The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal component analysis using this method.Almost sure asymptotic consistency and rates of convergence for the estimators of eigenvalues and eigenfunctions have been established.We also provide the convergence rate of the variance estimation of the measurement error.Based on the results,the number of observations within each curve can be of any rate relative to the sample size,which is consistent with the earlier conclusions about the asymptotic properties of the mean and covariance estimators.
基金the China Postdoctoral Science Foundation under Grant No.2019M651422the National Natural Science Foundation of China under Grant Nos.71701127,11831008 and 11971171+3 种基金the National Social Science Foundation Key Program under Grant No.17ZDA091the 111 Project of China under Grant No.B14019the Natural Science Foundation of Shanghai under Grant Nos.17ZR1409000 and 20ZR1423000the Project of Humanities and Social Science Foundation of Ministry of Education under Grant No.20YJC910003。
文摘The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.
文摘In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.
文摘We propose two simple regression models of Pearson correlation coefficient of two normal responses or binary responses to assess the effect of covariates of interest.Likelihood-based inference is established to estimate the regression coefficients,upon which bootstrap-based method is used to test the significance of covariates of interest.Simulation studies show the effectiveness of the method in terms of type-I error control,power performance in moderate sample size and robustness with respect to model mis-specification.We illustrate the application of the proposed method to some real data concerning health measurements.
基金supported by the National Key Research and Development Program of China(2015AA020108)the National Natural Science Foundation of China(31671377,81671326)+3 种基金Shanghai Municipal Science and Technology Major Project(2017SHZDZX01)Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science(East China Normal University)of Ministry of Educationthe Fundamental Research Funds for the Central Universities,Beihang University&Capital Medical University Advanced Innovation Center for Big Data-Based Precision Medicine Plan(BHME-201804,BHME-201904)The Special Fund of the Pediatric Medical Coordinated Development Center of Beijing Hospitals。
文摘Different psychiatric disorders share genetic relationships and pleiotropic loci to certain extent.We integrated and analyzed datasets related to major depressive disorder(MDD),bipolar disorder(BIP),and schizophrenia(SCZ)from the Psychiatric Genomics Consortium using multitrait analysis of genome-wide association analysis(MTAG).MTAG significantly increased the effective sample size from 99,773 to 119,754 for MDD,from 909,061 to 1,450,972 for BIP,and from 856,677 to 940,613 for SCZ.We discovered 7,32,and 43 novel lead single nucleotide polymorphisms(SNPs)and 1,6,and 3 novel causal SNPs for MDD,BIP,and SCZ,respectively,after fine-mapping.We identified rs8039305 in the FURIN gene as a novel pleiotropic locus across the three disorders.We performed marker analysis of genomic annotation(MAGMA)and Hi-C-coupled MAGMA(H-MAGMA)based gene-set analysis and identified 101 genes associated with the three disorders,which were enriched in the regulation of postsynaptic membranes,postsynaptic membrane dopaminergic synapses,and Notch signaling pathway.Next,we performed Mendelian randomization analysis using different tools and detected a causal effect of BIP on SCZ.Overall,we demonstrated the usage of combined genome-wide association studies summary statistics for exploring potential novel mechanisms of the three psychiatric disorders,providing an alternative approach to integrate publicly available summary data.
基金the National Natural Science Foundation of China Grant Nos. 1180135511871376 and 11971116Shanghai Pujiang Program 18PJ1409800。
文摘The authors propose a two-step test for the two-sample problem of processes of OrnsteinUhlenbeck type. In the first step, the authors test the equality of correlation structures, based on the least square estimators of the correlation parameters, and the test statistic follows the standard normal distribution. If the null hypothesis is not rejected in the first step, the authors consider a second step to test the equality of marginal distributions, based on the weighted deviation of the empirical characteristic functions;the test statistic has a complicated asymptotic distribution, so that sequential bootstrap method is applied to reach a temporary decision. Simulation studies and real data analysis suggest that the proposed approach performs well in finite samples.
基金the National Natural Science Foundation of China(No.11971149).
文摘Firstly,this paper proposes a generalized log-determinant optimization model with the purpose of estimating the high-dimensional sparse inverse covariance matrices.Under the normality assumption,the zero components in the inverse covariance matrices represent the conditional independence between pairs of variables given all the other variables.The generalized model considered in this study,because of the setting of the eigenvalue bounded constraints,covers a large number of existing estimators as special cases.Secondly,rather than directly tracking the challenging optimization problem,this paper uses a couple of alternating direction methods of multipliers(ADMM)to solve its dual model where 5 separable structures are contained.The first implemented algorithm is based on a single Gauss–Seidel iteration,but it does not necessarily converge theoretically.In contrast,the second algorithm employs the symmetric Gauss–Seidel(sGS)based ADMM which is equivalent to the 2-block iterative scheme from the latest sGS decomposition theorem.Finally,we do numerical simulations using the synthetic data and the real data set which show that both algorithms are very effective in estimating high-dimensional sparse inverse covariance matrix.
基金supported by the 111 Project[grant number B14019]the National Natural Science Foundation of China[grant numbers 11571113,11601157,11601320].
文摘This paper is devoted to study the proportional reinsurance/new business and investment problem under the mean-variance criterion in a continuous-time setting.The strategies are constrained in the non-negative cone and all coefficients in the model except the interest rate are stochastic processes adapted the filtration generated by a Markov chain.With the help of a backward stochastic differential equation driven by the Markov chain,we obtain the optimal strategy and optimal cost explicitly under this non-Markovian regime-switching model.The cases with one risky asset and Markov regime-switching model are considered as special cases.
文摘We propose two variable selection methods in multivariate linear regression with highdimensional covariates.The first method uses a multiple correlation coefficient to fast reduce the dimension of the relevant predictors to a moderate or low level.The second method extends the univariate forward regression of Wang[(2009).Forward regression for ultra-high dimensional variable screening.Journal of the American Statistical Association,104(488),1512–1524.https://doi.org/10.1198/jasa.2008.tm08516]in a unified way such that the variable selection and model estimation can be obtained simultaneously.We establish the sure screening property for both methods.Simulation and real data applications are presented to show the finite sample performance of the proposed methods in comparison with some naive method.
基金supported by the 111 Project of China(No.B14019)the National Natural Science Foundation of China[Grant No.11671146].
文摘Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properties such as admissibility in estimating θ with unknownτare not well studied.Han[(2009).Topics in shrinkage estimation and in causal inference(PhD thesis).Warton School,University of Pennsylvania]appears to be the first to consider the problem,developing sufficient conditions for the admissibility of estimating means of multivariate normal distributions with unknown variance.We generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model.2-level and 3-level hierarchical models with unknown precisionτare investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss.One reason to consider this problem is the importance of admissibility in the hierarchical prior selection,and we expect that our study could be helpful in providing some reference for choosing hierarchical priors.
基金supported by the Natural Science Foundation of China(Nos.11271136,81530086)the 111 Project of China(No.B14019).
文摘The mixture cure model is the most popular model used to analyse the major event with a potential cure fraction.But in the real world there may exist a potential risk from other non-curable competing events.In this paper,we study the accelerated failure time model with mixture cure model via kernel-based nonparametric maximum likelihood estimation allowing non-curable competing risk.An EM algorithm is developed to calculate the estimates for both the regression parameters and the unknown error densities,in which a kernel-smoothed conditional profile likelihood is maximised in the M-step,and the resulting estimates are consistent.Its performance is demonstrated through comprehensive simulation studies.Finally,the proposed method is applied to the colorectal clinical trial data.