This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement sp...This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement space that preserve a large number of historical traces of the ethnic culture of ancient China.They are important carriers of China’s excellent traditional culture and are key to the implementation of rural revitalization strategies.In this study,1652 EMV in China were selected as the research subjects.The Nearest Neighbor Index,kernel density,and spatial autocorrelation index were employed to reveal the spatial structural characteristics of minority villages.Neural network models,spatial lag models,and geographical detectors were used to analyze the formation mechanism of spatial heterogeneity in EMV.The results indicate that:(1)EMV exhibit significant spatial differentiation characterized by“single-core with multiple surrounding sub-centers,”“polarization between east and west,”“decreasing quantity from southwest to east coast to northeast to northwest,”and“large dispersion with small agglomeration.”(2)EMV are mainly distributed in areas rich in intangible cultural heritage,with high vegetation coverage and low altitude,far from central cities,and having limited arable land and an underdeveloped economy and transportation,particularly in shaded or riverbank areas.(3)Distance from the nearest river(X3),distance from central cities(X8),national intangible cultural heritage(X9),and NDVI(X10)were the main driving factors affecting the spatial distribution of EMV,whereas elevation(X1)and GDP(X5)had the weakest influence.As EMV are a relatively unique territorial spatial unit,the identification of their spatial heterogeneity characteristics not only deepens the research content of settlement geography,but also involves the assessment,protection,and development of Minority Villages,which is of great significance for the inheritance and utilization of excellent ethnic cultures in the era.展开更多
In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and...In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and surface of micropores) of a clay concrete without molasses and clay concretes stabilized with 8%, 12% and 16% molasses. The results obtained show that Hasley’s model can be used to obtain the external surfaces. However, it does not allow the surface of the micropores to be obtained, and is not suitable for the case of simple clay concrete (without molasses) and for clay concretes stabilized with molasses. The Carbon Black, Jaroniec and Harkins and Jura models can be used for clay concrete and stabilized clay concrete. However, the Carbon Black model is the most relevant for clay concrete and the Harkins and Jura model is for molasses-stabilized clay concrete. These last two models augur well for future research.展开更多
Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore ...Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.展开更多
The Bleve is an explosion involving both the rapid vaporization of liquid and the rapid expansion of vapor in a vessel.The loss of containment results in a large fireball if the stored chemical is flammable.In order t...The Bleve is an explosion involving both the rapid vaporization of liquid and the rapid expansion of vapor in a vessel.The loss of containment results in a large fireball if the stored chemical is flammable.In order to predict the damage generated by a Bleve,several authors propose analytical or semi-empirical correlations,which consist in predicting the diameter and the lifetime of the fireballs according to the quantity of fuel.These models are based on previous experience,which makes their validity arbitrary in relation to the initial conditions and the nature of the product concerned.The article delves into uncertainty analysis associated with analytical and semi-empirical models of the BLEVE fireball.It could explore how uncertainties in input data,and the choice of a more or less inappropriate model,propagate into the model results.Statistical techniques such as global sensitivity analysis or uncertainty analysis are employed to quantify these uncertainties.In this paper,an attempt is made to evaluate and select reasonable models available in the literature for characterizing fireballs and their consequences.Correlations were analyzed using statistical methods and BLEVE data(experimental and estimated data by correlation)to determine the residual sum of squares(RSS)and average absolute deviation(AAD).Analysis revealed that the Center for Chemical Process Safety(CCPS),the TNO(Netherlands Organization for Applied Scientific Research),and the Gayle model revealed a high degree of satisfaction between the experimental and estimated data through correlation.展开更多
The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 ...The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.展开更多
This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which...This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.展开更多
The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized tha...The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized that in real-world applications, the population usually has an explicit spatial structure which can significantly influence the dynamics. In the context of cancer initiation in epithelial tissue, several recent works have analyzed the dynamics of advantageous mutant spread on integer lattices, using the biased voter model from particle systems theory. In this spatial version of the Moran model, individuals first reproduce according to their fitness and then replace a neighboring individual. From a biological standpoint, the opposite dynamics, where individuals first die and are then replaced by a neighboring individual according to its fitness, are equally relevant. Here, we investigate this death-birth analogue of the biased voter model. We construct the process mathematically, derive the associated dual process, establish bounds on the survival probability of a single mutant, and prove that the process has an asymptotic shape. We also briefly discuss alternative birth-death and death-birth dynamics, depending on how the mutant fitness advantage affects the dynamics. We show that birth-death and death-birth formulations of the biased voter model are equivalent when fitness affects the former event of each update of the model, whereas the birth-death model is fundamentally different from the death-birth model when fitness affects the latter event.展开更多
Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical metho...Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical methods,including joint regression analysis(JRA),additive mean effects and multiplicative interaction(AMMI)analysis,genotype plus GE interaction(GGE)biplot analysis,and yield–stability(YSi)statistic were used to evaluate GE interaction in20 winter wheat genotypes grown in 24 environments in Iran.The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield,stability and yield–stability.Three kinds of genotypic ranks(yield ranks,stability ranks,and yield–stability ranks)were determined with each method.The results indicated the presence of GE interaction,suggesting the need for stability analysis.With respect to yield,the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated(P<0.01).For stability ranking,the rank correlations ranged from 0.53(GGE–YSi;P<0.05)to0.97(JRA–YSi;P<0.01).AMMI distance(AMMID)was highly correlated(P<0.01)with variance of regression deviation(S2di)in JRA(r=0.83)and Shukla stability variance(σ2)in YSi(r=0.86),indicating that these stability indices can be used interchangeably.No correlation was found between yield ranks and stability ranks(AMMID,S2di,σ2,and GGE stability index),indicating that they measure static stability and accordingly could be used if selection is based primarily on stability.For yield–stability,rank correlation coefficients among the statistical methods varied from 0.64(JRA–YSi;P<0.01)to 0.89(AMMI–YSi;P<0.01),indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance.Based on the results,it can be concluded that YSi was closely correlated with(i)JRA in ranking genotypes for stability and(ii)AMMI for integrating yield and stability.展开更多
[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was ...[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was analyzed by three models respectively: Finlay and Wilkinson model: the additive main effects and multiplicative interaction (AMMI) model and linear regression-principal components analysis (LR- PCA) model, so as to compare the models. [Result] The Finlay and Wilkinson model was easier, but the analysis of the other two models was more comprehensive, and there was a bit difference between the additive main effects and multiplicative inter- action (AMMI) model and linear regression-principal components analysis (LR-PCA) model. [Conclusion] In practice, while the proper statistical method was usually con- sidered according to the different data, it should be also considered that the same data should be analyzed with different statistical methods in order to get a more reasonable result by comparison.展开更多
QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL),...QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL), grain length to width ratio (LWR), chalk grain rate (CGR), chalkiness degree (CD), gelatinization temperature (GT), amylose content (AC) and gel consistency (GC) of head rice. Three mapping methods employed were composite interval mapping in QTLMapper 2.0 software based on mixed linear model (MCIM), inclusive composite interval mapping in QTL IciMapping 3.0 software based on stepwise regression linear model (ICIM) and multiple interval mapping with regression forward selection in Windows QTL Cartographer 2.5 based on multiple regression analysis (MIMR). Results showed that five QTLs with additive effect (A-QTLs) were detected by all the three methods simultaneously, two by two methods simultaneously, and 23 by only one method. Five A-QTLs were detected by MCIM, nine by ICIM and 28 by MIMR. The contribution rates of single A-QTL ranged from 0.89% to 38.07%. All the QTLs with epistatic effect (E-QTLs) detected by MIMR were not detected by the other two methods. Fourteen pairs of E-QTLs were detected by both MCIM and ICIM, and 142 pairs of E-QTLs were detected by only one method. Twenty-five pairs of E-QTLs were detected by MCIM, 141 pairs by ICIM and four pairs by MIMR. The contribution rates of single pair of E-QTL were from 2.60% to 23.78%. In the Xiu-Bao RIL population, epistatic effect played a major role in the variation of GL and CD, and additive effect was the dominant in the variation of LWR, while both epistatic effect and additive effect had equal importance in the variation of CGR, AC, GT and GC. QTLs detected by two or more methods simultaneously were highly reliable, and could be applied to improve the quality traits in japonica hybrid rice.展开更多
The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m...The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m/a.Indeed,these unfavorable conditions require interventions to rationalize integrated management in decision making.The aim of this study is to determine a water recharge index(WRI),delineate the potential groundwater recharge area and estimate the potential groundwater recharge rate based on the integration of statistical models resulted from remote sensing imagery,GIS digital data(e.g.,lithology,soil,runoff),measured artificial recharge data,fuzzy set theory and multi-criteria decision making(MCDM)using the analytical hierarchy process(AHP).Eight factors affecting potential groundwater recharge were determined,namely lithology,soil,slope,topography,land cover/use,runoff,drainage and lineaments.The WRI is between 1.2 and 3.1,which is classified into five classes as poor,weak,moderate,good and very good sites of potential groundwater recharge area.The very good and good classes occupied respectively 27%and 44%of the study area.The potential groundwater recharge rate was 43%of total precipitation.According to the results of the study,river beds are favorable sites for groundwater recharge.展开更多
Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced b...Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.展开更多
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
Recently, some results have been acquired with the Monte- Carlo statistical experiments in the design of ocean en gineering. The results show that Monte-Carlo statistical experiments can be widely used in estimating t...Recently, some results have been acquired with the Monte- Carlo statistical experiments in the design of ocean en gineering. The results show that Monte-Carlo statistical experiments can be widely used in estimating the parameters of wave statistical distributions, checking the probability model of the long- term wave extreme value distribution under a typhoon condition and calculating the failure probability of the ocean platforms.展开更多
The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign an...The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.展开更多
The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is n...The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is no simple way to reconcile them. So, simple climate models, like statistical-dynamical models (SDMs), appear to be useful in this context. This kind of models is essentially mechanistic, being directed towards understanding the dependence of a particular mechanism on the other parameters of the problem. In this paper, the utility of SDMs for studies of climate change is discussed in some detail. We show that these models are an indispensable part of hierarchy of climate models.展开更多
The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concep...The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concepts of the statistical and phenomenological methods of describing the classical systems do not quite correlate with each other. Particularly, in these methods various caloric ideal gas equations of state are employed, while the possibility existing in the thermodynamic cyclic processes to obtain the same distributions both due to a change of the particle concentration and owing to a change of temperature is not allowed for in the statistical methods. The above-mentioned difference of the equations of state is cleared away when using in the statistical functions corresponding to the canonical Gibbs equations instead of the Planck’s constant a new scale factor that depends on the parameters of a system and coincides with the Planck’s constant in going of the system to the degenerate state. Under such an approach, the statistical entropy is transformed into one of the forms of heat capacity. In its turn, the agreement of the methods under consideration in the question as to the dependence of the molecular distributions on the concentration of particles, apparently, will call for further refinement of the physical model of ideal gas and the techniques for its statistical description.展开更多
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word a...Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.展开更多
Deterministic compartment models(CMs)and stochastic models,including stochastic CMs and agent-based models,are widely utilized in epidemic modeling.However,the relationship between CMs and their corresponding stochast...Deterministic compartment models(CMs)and stochastic models,including stochastic CMs and agent-based models,are widely utilized in epidemic modeling.However,the relationship between CMs and their corresponding stochastic models is not well understood.The present study aimed to address this gap by conducting a comparative study using the susceptible,exposed,infectious,and recovered(SEIR)model and its extended CMs from the coronavirus disease 2019 modeling literature.We demonstrated the equivalence of the numerical solution of CMs using the Euler scheme and their stochastic counterparts through theoretical analysis and simulations.Based on this equivalence,we proposed an efficient model calibration method that could replicate the exact solution of CMs in the corresponding stochastic models through parameter adjustment.The advancement in calibration techniques enhanced the accuracy of stochastic modeling in capturing the dynamics of epidemics.However,it should be noted that discrete-time stochastic models cannot perfectly reproduce the exact solution of continuous-time CMs.Additionally,we proposed a new stochastic compartment and agent mixed model as an alternative to agent-based models for large-scale population simulations with a limited number of agents.This model offered a balance between computational efficiency and accuracy.The results of this research contributed to the comparison and unification of deterministic CMs and stochastic models in epidemic modeling.Furthermore,the results had implications for the development of hybrid models that integrated the strengths of both frameworks.Overall,the present study has provided valuable epidemic modeling techniques and their practical applications for understanding and controlling the spread of infectious diseases.展开更多
This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words...This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words along texts considering different topics. In this regard, following Minka proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an extension called sbDCM that takes explicitly into account the different latent topics that compound the document. We follow two alternative approaches: on one hand the topics can be unknown, thus to be estimated on the basis of the data, on the other hand topics are determined in advance on the basis of a predefined ontological schema. The two possible approaches are assessed on the basis of real data.展开更多
文摘This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement space that preserve a large number of historical traces of the ethnic culture of ancient China.They are important carriers of China’s excellent traditional culture and are key to the implementation of rural revitalization strategies.In this study,1652 EMV in China were selected as the research subjects.The Nearest Neighbor Index,kernel density,and spatial autocorrelation index were employed to reveal the spatial structural characteristics of minority villages.Neural network models,spatial lag models,and geographical detectors were used to analyze the formation mechanism of spatial heterogeneity in EMV.The results indicate that:(1)EMV exhibit significant spatial differentiation characterized by“single-core with multiple surrounding sub-centers,”“polarization between east and west,”“decreasing quantity from southwest to east coast to northeast to northwest,”and“large dispersion with small agglomeration.”(2)EMV are mainly distributed in areas rich in intangible cultural heritage,with high vegetation coverage and low altitude,far from central cities,and having limited arable land and an underdeveloped economy and transportation,particularly in shaded or riverbank areas.(3)Distance from the nearest river(X3),distance from central cities(X8),national intangible cultural heritage(X9),and NDVI(X10)were the main driving factors affecting the spatial distribution of EMV,whereas elevation(X1)and GDP(X5)had the weakest influence.As EMV are a relatively unique territorial spatial unit,the identification of their spatial heterogeneity characteristics not only deepens the research content of settlement geography,but also involves the assessment,protection,and development of Minority Villages,which is of great significance for the inheritance and utilization of excellent ethnic cultures in the era.
文摘In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and surface of micropores) of a clay concrete without molasses and clay concretes stabilized with 8%, 12% and 16% molasses. The results obtained show that Hasley’s model can be used to obtain the external surfaces. However, it does not allow the surface of the micropores to be obtained, and is not suitable for the case of simple clay concrete (without molasses) and for clay concretes stabilized with molasses. The Carbon Black, Jaroniec and Harkins and Jura models can be used for clay concrete and stabilized clay concrete. However, the Carbon Black model is the most relevant for clay concrete and the Harkins and Jura model is for molasses-stabilized clay concrete. These last two models augur well for future research.
基金funded by the"Genetic improvement of pig survival"project from Danish Pig Levy Foundation (Aarhus,Denmark)The China Scholarship Council (CSC)for providing scholarship to the first author。
文摘Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.
文摘The Bleve is an explosion involving both the rapid vaporization of liquid and the rapid expansion of vapor in a vessel.The loss of containment results in a large fireball if the stored chemical is flammable.In order to predict the damage generated by a Bleve,several authors propose analytical or semi-empirical correlations,which consist in predicting the diameter and the lifetime of the fireballs according to the quantity of fuel.These models are based on previous experience,which makes their validity arbitrary in relation to the initial conditions and the nature of the product concerned.The article delves into uncertainty analysis associated with analytical and semi-empirical models of the BLEVE fireball.It could explore how uncertainties in input data,and the choice of a more or less inappropriate model,propagate into the model results.Statistical techniques such as global sensitivity analysis or uncertainty analysis are employed to quantify these uncertainties.In this paper,an attempt is made to evaluate and select reasonable models available in the literature for characterizing fireballs and their consequences.Correlations were analyzed using statistical methods and BLEVE data(experimental and estimated data by correlation)to determine the residual sum of squares(RSS)and average absolute deviation(AAD).Analysis revealed that the Center for Chemical Process Safety(CCPS),the TNO(Netherlands Organization for Applied Scientific Research),and the Gayle model revealed a high degree of satisfaction between the experimental and estimated data through correlation.
文摘The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.
基金National Natural Science Foundation of China ( No.60803078)National High Technology Research and Development Programs of China (No.2006AA010107, No.2006AA010108)
文摘This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.
基金supported in part by the NIH grant R01CA241134supported in part by the NSF grant CMMI-1552764+3 种基金supported in part by the NSF grants DMS-1349724 and DMS-2052465supported in part by the NSF grant CCF-1740761supported in part by the U.S.-Norway Fulbright Foundation and the Research Council of Norway R&D Grant 309273supported in part by the Norwegian Centennial Chair grant and the Doctoral Dissertation Fellowship from the University of Minnesota.
文摘The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized that in real-world applications, the population usually has an explicit spatial structure which can significantly influence the dynamics. In the context of cancer initiation in epithelial tissue, several recent works have analyzed the dynamics of advantageous mutant spread on integer lattices, using the biased voter model from particle systems theory. In this spatial version of the Moran model, individuals first reproduce according to their fitness and then replace a neighboring individual. From a biological standpoint, the opposite dynamics, where individuals first die and are then replaced by a neighboring individual according to its fitness, are equally relevant. Here, we investigate this death-birth analogue of the biased voter model. We construct the process mathematically, derive the associated dual process, establish bounds on the survival probability of a single mutant, and prove that the process has an asymptotic shape. We also briefly discuss alternative birth-death and death-birth dynamics, depending on how the mutant fitness advantage affects the dynamics. We show that birth-death and death-birth formulations of the biased voter model are equivalent when fitness affects the former event of each update of the model, whereas the birth-death model is fundamentally different from the death-birth model when fitness affects the latter event.
基金the bread wheat project of the Dryland Agricultural Research Institute (DARI)supported by the Agricultural Research and Education Organization (AREO) of Iran
文摘Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical methods,including joint regression analysis(JRA),additive mean effects and multiplicative interaction(AMMI)analysis,genotype plus GE interaction(GGE)biplot analysis,and yield–stability(YSi)statistic were used to evaluate GE interaction in20 winter wheat genotypes grown in 24 environments in Iran.The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield,stability and yield–stability.Three kinds of genotypic ranks(yield ranks,stability ranks,and yield–stability ranks)were determined with each method.The results indicated the presence of GE interaction,suggesting the need for stability analysis.With respect to yield,the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated(P<0.01).For stability ranking,the rank correlations ranged from 0.53(GGE–YSi;P<0.05)to0.97(JRA–YSi;P<0.01).AMMI distance(AMMID)was highly correlated(P<0.01)with variance of regression deviation(S2di)in JRA(r=0.83)and Shukla stability variance(σ2)in YSi(r=0.86),indicating that these stability indices can be used interchangeably.No correlation was found between yield ranks and stability ranks(AMMID,S2di,σ2,and GGE stability index),indicating that they measure static stability and accordingly could be used if selection is based primarily on stability.For yield–stability,rank correlation coefficients among the statistical methods varied from 0.64(JRA–YSi;P<0.01)to 0.89(AMMI–YSi;P<0.01),indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance.Based on the results,it can be concluded that YSi was closely correlated with(i)JRA in ranking genotypes for stability and(ii)AMMI for integrating yield and stability.
基金Supported by the Guangdong Technological Program (2009B02001002)the Special Funds of National Agricultural Department for Commonweal Trade Research (nyhyzx07-019)the Earmarked Fund for Modern Agro-industry Technology Research System~~
文摘[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was analyzed by three models respectively: Finlay and Wilkinson model: the additive main effects and multiplicative interaction (AMMI) model and linear regression-principal components analysis (LR- PCA) model, so as to compare the models. [Result] The Finlay and Wilkinson model was easier, but the analysis of the other two models was more comprehensive, and there was a bit difference between the additive main effects and multiplicative inter- action (AMMI) model and linear regression-principal components analysis (LR-PCA) model. [Conclusion] In practice, while the proper statistical method was usually con- sidered according to the different data, it should be also considered that the same data should be analyzed with different statistical methods in order to get a more reasonable result by comparison.
基金supported by the National High Technology Research and Development Program of China (Grant No. 2010AA101301)the Program of Introducing International Advanced Agricultural Science and Technology in China (Grant No. 2006-G8[4]-31-1)the Program of Science-Technology Basis and Conditional Platform in China (Grant No. 505005)
文摘QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL), grain length to width ratio (LWR), chalk grain rate (CGR), chalkiness degree (CD), gelatinization temperature (GT), amylose content (AC) and gel consistency (GC) of head rice. Three mapping methods employed were composite interval mapping in QTLMapper 2.0 software based on mixed linear model (MCIM), inclusive composite interval mapping in QTL IciMapping 3.0 software based on stepwise regression linear model (ICIM) and multiple interval mapping with regression forward selection in Windows QTL Cartographer 2.5 based on multiple regression analysis (MIMR). Results showed that five QTLs with additive effect (A-QTLs) were detected by all the three methods simultaneously, two by two methods simultaneously, and 23 by only one method. Five A-QTLs were detected by MCIM, nine by ICIM and 28 by MIMR. The contribution rates of single A-QTL ranged from 0.89% to 38.07%. All the QTLs with epistatic effect (E-QTLs) detected by MIMR were not detected by the other two methods. Fourteen pairs of E-QTLs were detected by both MCIM and ICIM, and 142 pairs of E-QTLs were detected by only one method. Twenty-five pairs of E-QTLs were detected by MCIM, 141 pairs by ICIM and four pairs by MIMR. The contribution rates of single pair of E-QTL were from 2.60% to 23.78%. In the Xiu-Bao RIL population, epistatic effect played a major role in the variation of GL and CD, and additive effect was the dominant in the variation of LWR, while both epistatic effect and additive effect had equal importance in the variation of CGR, AC, GT and GC. QTLs detected by two or more methods simultaneously were highly reliable, and could be applied to improve the quality traits in japonica hybrid rice.
文摘The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m/a.Indeed,these unfavorable conditions require interventions to rationalize integrated management in decision making.The aim of this study is to determine a water recharge index(WRI),delineate the potential groundwater recharge area and estimate the potential groundwater recharge rate based on the integration of statistical models resulted from remote sensing imagery,GIS digital data(e.g.,lithology,soil,runoff),measured artificial recharge data,fuzzy set theory and multi-criteria decision making(MCDM)using the analytical hierarchy process(AHP).Eight factors affecting potential groundwater recharge were determined,namely lithology,soil,slope,topography,land cover/use,runoff,drainage and lineaments.The WRI is between 1.2 and 3.1,which is classified into five classes as poor,weak,moderate,good and very good sites of potential groundwater recharge area.The very good and good classes occupied respectively 27%and 44%of the study area.The potential groundwater recharge rate was 43%of total precipitation.According to the results of the study,river beds are favorable sites for groundwater recharge.
文摘Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
文摘Recently, some results have been acquired with the Monte- Carlo statistical experiments in the design of ocean en gineering. The results show that Monte-Carlo statistical experiments can be widely used in estimating the parameters of wave statistical distributions, checking the probability model of the long- term wave extreme value distribution under a typhoon condition and calculating the failure probability of the ocean platforms.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61773091 and 61603073)the LiaoNing Revitalization Talents Program(Grant No.XLYC1807106)the Natural Science Foundation of Liaoning Province,China(Grant No.2020-MZLH-22).
文摘The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.
文摘The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is no simple way to reconcile them. So, simple climate models, like statistical-dynamical models (SDMs), appear to be useful in this context. This kind of models is essentially mechanistic, being directed towards understanding the dependence of a particular mechanism on the other parameters of the problem. In this paper, the utility of SDMs for studies of climate change is discussed in some detail. We show that these models are an indispensable part of hierarchy of climate models.
文摘The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concepts of the statistical and phenomenological methods of describing the classical systems do not quite correlate with each other. Particularly, in these methods various caloric ideal gas equations of state are employed, while the possibility existing in the thermodynamic cyclic processes to obtain the same distributions both due to a change of the particle concentration and owing to a change of temperature is not allowed for in the statistical methods. The above-mentioned difference of the equations of state is cleared away when using in the statistical functions corresponding to the canonical Gibbs equations instead of the Planck’s constant a new scale factor that depends on the parameters of a system and coincides with the Planck’s constant in going of the system to the degenerate state. Under such an approach, the statistical entropy is transformed into one of the forms of heat capacity. In its turn, the agreement of the methods under consideration in the question as to the dependence of the molecular distributions on the concentration of particles, apparently, will call for further refinement of the physical model of ideal gas and the techniques for its statistical description.
基金supported by the National Natural Science Foundation of China(No.61303082) the Research Fund for the Doctoral Program of Higher Education of China(No.20120121120046)
文摘Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.
基金supported by the National Natural Science Foundation of China(Grant Nos.82173620 to Yang Zhao and 82041024 to Feng Chen)partially supported by the Bill&Melinda Gates Foundation(Grant No.INV-006371 to Feng Chen)Priority Academic Program Development of Jiangsu Higher Education Institutions.
文摘Deterministic compartment models(CMs)and stochastic models,including stochastic CMs and agent-based models,are widely utilized in epidemic modeling.However,the relationship between CMs and their corresponding stochastic models is not well understood.The present study aimed to address this gap by conducting a comparative study using the susceptible,exposed,infectious,and recovered(SEIR)model and its extended CMs from the coronavirus disease 2019 modeling literature.We demonstrated the equivalence of the numerical solution of CMs using the Euler scheme and their stochastic counterparts through theoretical analysis and simulations.Based on this equivalence,we proposed an efficient model calibration method that could replicate the exact solution of CMs in the corresponding stochastic models through parameter adjustment.The advancement in calibration techniques enhanced the accuracy of stochastic modeling in capturing the dynamics of epidemics.However,it should be noted that discrete-time stochastic models cannot perfectly reproduce the exact solution of continuous-time CMs.Additionally,we proposed a new stochastic compartment and agent mixed model as an alternative to agent-based models for large-scale population simulations with a limited number of agents.This model offered a balance between computational efficiency and accuracy.The results of this research contributed to the comparison and unification of deterministic CMs and stochastic models in epidemic modeling.Furthermore,the results had implications for the development of hybrid models that integrated the strengths of both frameworks.Overall,the present study has provided valuable epidemic modeling techniques and their practical applications for understanding and controlling the spread of infectious diseases.
文摘This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words along texts considering different topics. In this regard, following Minka proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an extension called sbDCM that takes explicitly into account the different latent topics that compound the document. We follow two alternative approaches: on one hand the topics can be unknown, thus to be estimated on the basis of the data, on the other hand topics are determined in advance on the basis of a predefined ontological schema. The two possible approaches are assessed on the basis of real data.