Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that...Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that is determined by the tail of F. Extensions of this theory from the iid case to stationary and weak dependent sequences are well known from the work of Leadbetter, Lindgreen and Rootzén. In this paper, we present a very simple class of random processes that runs from iid sequences to non-stationary and strongly dependent processes, and we study the asymptotic behavior of its normalized maximum. More interesting, we show that when the process is strongly dependent, the asymptotic distribution is no longer an extremal one, but a mixture of extremal distributions. We present very simple theoretical and simulated examples of this result. This provides a simple framework to asymptotic approximations of extremes values not covered by classical extremal theory and its well-known extensions.展开更多
In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical pro...In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.展开更多
In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when ...In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.展开更多
With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the ...With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.展开更多
The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take mult...The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching展开更多
Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear ...Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.展开更多
In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data local...In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.展开更多
This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of loca...This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of local false discovery rate statistics,which are used in many multiple testing procedures and related to Bayesian posterior probabilities.Explicit forms of the marginaldistributions under both correctly specified and incorrectly specified models are derived.TheKullback–Leibler divergence is used to quantify the influence caused by a misspecification.Several numerical examples are provided to illustrate the influence.A real spatio-temporal data onsoil humidity is discussed.展开更多
文摘Fisher-Tippet-Gnedenko classical theory shows that the normalized maximum of n iid random variables with distribution F belonging to a very wide class of functions, converges in law to an extremal distribution H, that is determined by the tail of F. Extensions of this theory from the iid case to stationary and weak dependent sequences are well known from the work of Leadbetter, Lindgreen and Rootzén. In this paper, we present a very simple class of random processes that runs from iid sequences to non-stationary and strongly dependent processes, and we study the asymptotic behavior of its normalized maximum. More interesting, we show that when the process is strongly dependent, the asymptotic distribution is no longer an extremal one, but a mixture of extremal distributions. We present very simple theoretical and simulated examples of this result. This provides a simple framework to asymptotic approximations of extremes values not covered by classical extremal theory and its well-known extensions.
文摘In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.
文摘In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.
基金This work was supported by the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJB520026 and 20KJA520002the Foundation for Young Teachers of Nanjing Auditing University under Grant 19QNPY018the National Nature Science Foundation of China under Grant 71972102 and 61902189.
文摘With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.
文摘The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching
文摘Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.
基金supported by National Numerical Wind tunnel project NNW2019ZT6-B18 and Guangdong Introducing Innovative&Entrepreneurial Teams under Grant No.2016ZT06D211.
文摘In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.
基金This research is partially supported by National Science Foundation[grant number OIA-1301789].
文摘This paper focuses on the influence of a misspecified covariance structure on false discoveryrate for the large-scale multiple testing problem.Specifically,we evaluate the influence on themarginal distribution of local false discovery rate statistics,which are used in many multiple testing procedures and related to Bayesian posterior probabilities.Explicit forms of the marginaldistributions under both correctly specified and incorrectly specified models are derived.TheKullback–Leibler divergence is used to quantify the influence caused by a misspecification.Several numerical examples are provided to illustrate the influence.A real spatio-temporal data onsoil humidity is discussed.