Two-stage adaptive cluster sampling and two-stage conventional sampling designs were used to estimate population total of Fringe-Eared Oryx that are clustered and sparsely distributed. The study region was Amboseli-We...Two-stage adaptive cluster sampling and two-stage conventional sampling designs were used to estimate population total of Fringe-Eared Oryx that are clustered and sparsely distributed. The study region was Amboseli-West Kilimanjaro and Magadi-Natron cross boarder landscape between Kenya and Tanzania. The study region was partitioned into different primary sampling units with different secondary sampling units that were of different sizes. Results show that two-stage adaptive cluster sampling design is efficient compared to simple random sampling and the conventional two- stage sampling design. The design is less variable compared to the conventional two-stage sampling design.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Adaptive cluster sampling (ACS) has been a very important tool in estimation of population parameters of rare and clustered population. The fundamental idea behind this sampling plan is to decide on an initial sample ...Adaptive cluster sampling (ACS) has been a very important tool in estimation of population parameters of rare and clustered population. The fundamental idea behind this sampling plan is to decide on an initial sample from a defined population and to keep on sampling within the vicinity of the units that satisfy the condition that at least one characteristic of interest exists in a unit selected in the initial sample. Despite being an important tool for sampling rare and clustered population, adaptive cluster sampling design is unable to control the final sample size when no prior knowledge of the population is available. Thus adaptive cluster sampling with data-driven stopping rule (ACS’) was proposed to control the final sample size when prior knowledge of population structure is not available. This study examined the behavior of the HT, and HH estimator under the ACS design and ACS’ design using artificial population that is designed to have all the characteristics of a rare and clustered population. The efficiencies of the HT and HH estimator were used to determine the most efficient design in estimation of population mean in rare and clustered population. Results of both the simulated data and the real data show that the adaptive cluster sampling with stopping rule is more efficient for estimation of rare and clustered population than ordinary adaptive cluster sampling.展开更多
Non-response is a regular occurrence in Sample Surveys. Developing estimators when non-response exists may result in large biases when estimating population parameters. In this paper, a finite population mean is estim...Non-response is a regular occurrence in Sample Surveys. Developing estimators when non-response exists may result in large biases when estimating population parameters. In this paper, a finite population mean is estimated when non-response exists randomly under two stage cluster sampling with replacement. It is assumed that non-response arises in the survey variable in the second stage of cluster sampling. Weighting method of compensating for non-response is applied. Asymptotic properties of the proposed estimator of the population mean are derived. Under mild assumptions, the estimator is shown to be asymptotically consistent.展开更多
In this paper, we study the estimators of the population mean in stratified adaptive cluster sampling by using the information of the auxiliary variable. Simulations showed that if the variable of interest (y) and the...In this paper, we study the estimators of the population mean in stratified adaptive cluster sampling by using the information of the auxiliary variable. Simulations showed that if the variable of interest (y) and the auxiliary variables (x,z) have high positive correlation then the estimate of the mean square error of the ratio estimators is less than the estimate of the mean square error of the product estimator. The estimators which use only one auxiliary variable were better than the estimators which use two auxiliary variables.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the co...There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the condition tables of national inventory programs are characterized as cross-classes domains,such as vegetation type,productivity class,and age class.To date,challenges remain active for inventorying cross-classes domains because these domains are usually of unknown sampling frame and spatial distribution with the result that inference relies on population-level as opposed to domain-level sampling.Multiple challenges are noteworthy:(1)efficient sampling strategies are difficult to develop because of little priori information about the target domain;(2)domain inference relies on a sample designed for the population,so within-domain sample sizes could be too small to support a precise estimation;and(3)increasing sample size for the population does not ensure an increase to the domain,so actual sample size for a target domain remains highly uncertain,particularly for small domains.In this paper,we introduce a design-based generalized systematic adaptive cluster sampling(GSACS)for inventorying cross-classes domains.Design-unbiased Hansen-Hurwitz and Horvitz-Thompson estimators are derived for domain totals and compared within GSACS and with systematic sampling(SYS).Comprehensive Monte Carlo simulations show that(1)GSACS Hansen-Hurwitz and Horvitz-Thompson estimators are unbiased and equally efficient,whereas thelatter outperforms the former for supporting a sample of size one;(2)SYS is a special case of GSACS while the latter outperforms the former in terms of increased efficiency and reduced intensity;(3)GSACS Horvitz-Thompson variance estimator is design-unbiased for a single SYS sample;and(4)rules-ofthumb summarized with respect to sampling design and spatial effect improve precision.Because inventorying a mini domain is analogous to inventorying a rare variable,alternative network sampling procedures are also readily available for inventorying cross-classes domains.展开更多
The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poison...The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poisoning attack, which disturbsmachine learning algorithms by injecting poisoning samples, is an attack with the greatestthreat. In this paper, we focus on analyzing the characteristics of positioning samples andpropose a novel sample evaluation method to defend against the poisoning attack cateringfor the characteristics of poisoning samples. To capture the intrinsic data characteristicsfrom heterogeneous aspects, we first evaluate training data by multiple criteria, each ofwhich is reformulated from a spectral clustering. Then, we integrate the multipleevaluation scores generated by the multiple criteria through the proposed multiplespectral clustering aggregation (MSCA) method. Finally, we use the unified score as theindicator of poisoning attack samples. Experimental results on intrusion detection datasets show that MSCA significantly outperforms the K-means outlier detection in terms ofdata legality evaluation and poisoning attack detection.展开更多
Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden di...Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden different types of immunization programs have been implemented. For better immunization coverage in developing countries, considerable progress is to be made to improve the knowledge and awareness regarding importance of vaccines. In this study a compara-tive study of immunization coverage under two sampling methods has been performed. Methods: In this study variance and design effect of proportion of children vaccinated against different types of vaccines (BCG, OPV, DPT, Hepatitis B, Hib, Measles and MMR) are estimated under two stage (30 × 30) cluster and systematic sampling for comparison of these two survey sampling methods. Also the homogeneity of clusters has been tested by using chi-square test. Results: It is observed that BCG, OPV and DPT vaccination coverage is more than 90% whereas Hepatitis B, Measles, Hib and MMR vaccination coverage is between 50% - 64% only. Here systematic random sampling is more complicated than two stage (30 × 30) cluster sampling. Also the result shows that the clusters are homogeneous with respect to proportion of children vaccinated. Conclusion: There is no significant difference between the two survey methodologies regarding the point estimation of vaccination coverage but estimation of variances of vaccination coverage is less in two stage (30 × 30) cluster sampling than that of the systematic sampling. Also the clusters are homogeneous. Very less improvement has been observed in case of fully vaccination coverage than the previous study. From the study it can be said that two stage (30 × 30) cluster sampling will be preferred to systematic sampling and simple random sampling method.展开更多
Studies in tobacco fields were conducted in 1993. The results showed that the distribution pattern of the larva was aggregative,and the aggregation did not change with the densities of population of the larva. The cha...Studies in tobacco fields were conducted in 1993. The results showed that the distribution pattern of the larva was aggregative,and the aggregation did not change with the densities of population of the larva. The characteristics of the vertical distribution of the larva on tobacco plants was more in the lower leaves than in the upper. The difference of population density among the tobacco fields with an elevation of 490 meters and 900 meters was not significant. The number of sampling was given under different precisions by using two-stage sampling technique. The average of leaf area loss caused by the larva in tobacco fields was 12.654 cm2.展开更多
The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to...The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to define the ultimate survey units. All of them use the face-to-face interview as the survey procedure. The hypothesis to be tested is that it is possible to achieve the same degree of representativeness using a combination of random route sampling and quota sampling (with substitution) as it can be achieved by means of household sampling (without substitution) based on the municipal register of inhabitants. We have found such marked differences in the age and gender distribution of the probability sampling, where the deviations exceed 6%. A different picture emerges when it comes to comparing the employment variables, where the quota sampling overestimates the economic activity rate (2.5%) and the unemployment rate (8%) and underestimates the employment rate (3.46%).展开更多
If the population is rare and clustered,then simple random sampling gives a poor estimate of the population total.For such type of populations,adaptive cluster sampling is useful.But it loses control on the final samp...If the population is rare and clustered,then simple random sampling gives a poor estimate of the population total.For such type of populations,adaptive cluster sampling is useful.But it loses control on the final sample size.Hence,the cost of sampling increases substantially.To overcome this problem,the surveyors often use auxiliary information which is easy to obtain and inexpensive.An attempt is made through the auxiliary information to control the final sample size.In this article,we have proposed two-stage negative adaptive cluster sampling design.It is a new design,which is a combination of two-stage sampling and negative adaptive cluster sampling designs.In this design,we consider an auxiliary variablewhich is highly negatively correlatedwith the variable of interest and auxiliary information is completely known.In the first stage of this design,an initial random sample is drawn by using the auxiliary information.Further,using Thompson’s(JAmStat Assoc 85:1050-1059,1990)adaptive procedure networks in the population are discovered.These networks serve as the primary-stage units(PSUs).In the second stage,random samples of unequal sizes are drawn from the PSUs to get the secondary-stage units(SSUs).The values of the auxiliary variable and the variable of interest are recorded for these SSUs.Regression estimator is proposed to estimate the population total of the variable of interest.A new estimator,Composite Horwitz-Thompson(CHT)-type estimator,is also proposed.It is based on only the information on the variable of interest.Variances of the above two estimators along with their unbiased estimators are derived.Using this proposed methodology,sample survey was conducted at Western Ghat of Maharashtra,India.The comparison of the performance of these estimators and methodology is presented and compared with other existing methods.The cost-benefit analysis is given.展开更多
This work was carried out with the objective of proposing some changes in the Strand’s sampling method, in which the trees are selected in sampling units with probability proportional to its diameter for the calculat...This work was carried out with the objective of proposing some changes in the Strand’s sampling method, in which the trees are selected in sampling units with probability proportional to its diameter for the calculation of the stand density and basal area, and proportional to its height for the calculation of volume per hectare. Data used to evaluate the efficiency of the sampling of Strand in clusters were collected in stands of Pinus elliottii Engelm, located in a National Forest, Rio Grande do Sul State, Brazil. In the course of this research work it was proposed to convert the sampling unit into a cluster, structurally more efficient to obtain consistent estimates of volume and of dominant heights, using volumetric equivalence, which results in a form factor equal to one for the final calculation of volume per hectare and an indirect method to obtain the average height of Lorey. The objectives of this study were achieved, because with this methodology it is not necessary to measure heights of trees in the sampling unit, except a dominant height by cluster to evaluate sites. The development of independent estimators for basal area and volume gave rise to the proposition of an estimator for average height of Lorey, but without measuring any tree height in the sampling. The proposed methodology is an attractive solution to reduce costs in forest inventories, with the ability to have greater accuracy and scope for information at the level of compartments, without increasing the cost of sampling in comparison to that performed with units of fixed area. The use of smaller permanent sampling units with higher intensity in the compartments before the final cut will substantially increase the precision of the estimators in these management units, which will enable them to eliminate the pre-cut inventory in forest enterprises.展开更多
Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weig...Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.展开更多
文摘Two-stage adaptive cluster sampling and two-stage conventional sampling designs were used to estimate population total of Fringe-Eared Oryx that are clustered and sparsely distributed. The study region was Amboseli-West Kilimanjaro and Magadi-Natron cross boarder landscape between Kenya and Tanzania. The study region was partitioned into different primary sampling units with different secondary sampling units that were of different sizes. Results show that two-stage adaptive cluster sampling design is efficient compared to simple random sampling and the conventional two- stage sampling design. The design is less variable compared to the conventional two-stage sampling design.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘Adaptive cluster sampling (ACS) has been a very important tool in estimation of population parameters of rare and clustered population. The fundamental idea behind this sampling plan is to decide on an initial sample from a defined population and to keep on sampling within the vicinity of the units that satisfy the condition that at least one characteristic of interest exists in a unit selected in the initial sample. Despite being an important tool for sampling rare and clustered population, adaptive cluster sampling design is unable to control the final sample size when no prior knowledge of the population is available. Thus adaptive cluster sampling with data-driven stopping rule (ACS’) was proposed to control the final sample size when prior knowledge of population structure is not available. This study examined the behavior of the HT, and HH estimator under the ACS design and ACS’ design using artificial population that is designed to have all the characteristics of a rare and clustered population. The efficiencies of the HT and HH estimator were used to determine the most efficient design in estimation of population mean in rare and clustered population. Results of both the simulated data and the real data show that the adaptive cluster sampling with stopping rule is more efficient for estimation of rare and clustered population than ordinary adaptive cluster sampling.
文摘Non-response is a regular occurrence in Sample Surveys. Developing estimators when non-response exists may result in large biases when estimating population parameters. In this paper, a finite population mean is estimated when non-response exists randomly under two stage cluster sampling with replacement. It is assumed that non-response arises in the survey variable in the second stage of cluster sampling. Weighting method of compensating for non-response is applied. Asymptotic properties of the proposed estimator of the population mean are derived. Under mild assumptions, the estimator is shown to be asymptotically consistent.
文摘In this paper, we study the estimators of the population mean in stratified adaptive cluster sampling by using the information of the auxiliary variable. Simulations showed that if the variable of interest (y) and the auxiliary variables (x,z) have high positive correlation then the estimate of the mean square error of the ratio estimators is less than the estimate of the mean square error of the product estimator. The estimators which use only one auxiliary variable were better than the estimators which use two auxiliary variables.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
基金supported by the Fundamental Research Funds for the Central Universities (Grant No. 2021ZY04)the National Natural Science Foundation of China (Grant No. 32001252)the International Center for Bamboo and Rattan (Grant No. 1632020029)
文摘There are two distinct types of domains,design-and cross-classes domains,with the former extensively studied under the topic of small-area estimation.In natural resource inventory,however,most classes listed in the condition tables of national inventory programs are characterized as cross-classes domains,such as vegetation type,productivity class,and age class.To date,challenges remain active for inventorying cross-classes domains because these domains are usually of unknown sampling frame and spatial distribution with the result that inference relies on population-level as opposed to domain-level sampling.Multiple challenges are noteworthy:(1)efficient sampling strategies are difficult to develop because of little priori information about the target domain;(2)domain inference relies on a sample designed for the population,so within-domain sample sizes could be too small to support a precise estimation;and(3)increasing sample size for the population does not ensure an increase to the domain,so actual sample size for a target domain remains highly uncertain,particularly for small domains.In this paper,we introduce a design-based generalized systematic adaptive cluster sampling(GSACS)for inventorying cross-classes domains.Design-unbiased Hansen-Hurwitz and Horvitz-Thompson estimators are derived for domain totals and compared within GSACS and with systematic sampling(SYS).Comprehensive Monte Carlo simulations show that(1)GSACS Hansen-Hurwitz and Horvitz-Thompson estimators are unbiased and equally efficient,whereas thelatter outperforms the former for supporting a sample of size one;(2)SYS is a special case of GSACS while the latter outperforms the former in terms of increased efficiency and reduced intensity;(3)GSACS Horvitz-Thompson variance estimator is design-unbiased for a single SYS sample;and(4)rules-ofthumb summarized with respect to sampling design and spatial effect improve precision.Because inventorying a mini domain is analogous to inventorying a rare variable,alternative network sampling procedures are also readily available for inventorying cross-classes domains.
文摘The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poisoning attack, which disturbsmachine learning algorithms by injecting poisoning samples, is an attack with the greatestthreat. In this paper, we focus on analyzing the characteristics of positioning samples andpropose a novel sample evaluation method to defend against the poisoning attack cateringfor the characteristics of poisoning samples. To capture the intrinsic data characteristicsfrom heterogeneous aspects, we first evaluate training data by multiple criteria, each ofwhich is reformulated from a spectral clustering. Then, we integrate the multipleevaluation scores generated by the multiple criteria through the proposed multiplespectral clustering aggregation (MSCA) method. Finally, we use the unified score as theindicator of poisoning attack samples. Experimental results on intrusion detection datasets show that MSCA significantly outperforms the K-means outlier detection in terms ofdata legality evaluation and poisoning attack detection.
文摘Background: Immunization averts a large number of children in each year. The burden of vaccine preventable diseases remains high in developing countries compared to developed countries. To overcome from this burden different types of immunization programs have been implemented. For better immunization coverage in developing countries, considerable progress is to be made to improve the knowledge and awareness regarding importance of vaccines. In this study a compara-tive study of immunization coverage under two sampling methods has been performed. Methods: In this study variance and design effect of proportion of children vaccinated against different types of vaccines (BCG, OPV, DPT, Hepatitis B, Hib, Measles and MMR) are estimated under two stage (30 × 30) cluster and systematic sampling for comparison of these two survey sampling methods. Also the homogeneity of clusters has been tested by using chi-square test. Results: It is observed that BCG, OPV and DPT vaccination coverage is more than 90% whereas Hepatitis B, Measles, Hib and MMR vaccination coverage is between 50% - 64% only. Here systematic random sampling is more complicated than two stage (30 × 30) cluster sampling. Also the result shows that the clusters are homogeneous with respect to proportion of children vaccinated. Conclusion: There is no significant difference between the two survey methodologies regarding the point estimation of vaccination coverage but estimation of variances of vaccination coverage is less in two stage (30 × 30) cluster sampling than that of the systematic sampling. Also the clusters are homogeneous. Very less improvement has been observed in case of fully vaccination coverage than the previous study. From the study it can be said that two stage (30 × 30) cluster sampling will be preferred to systematic sampling and simple random sampling method.
文摘Studies in tobacco fields were conducted in 1993. The results showed that the distribution pattern of the larva was aggregative,and the aggregation did not change with the densities of population of the larva. The characteristics of the vertical distribution of the larva on tobacco plants was more in the lower leaves than in the upper. The difference of population density among the tobacco fields with an elevation of 490 meters and 900 meters was not significant. The number of sampling was given under different precisions by using two-stage sampling technique. The average of leaf area loss caused by the larva in tobacco fields was 12.654 cm2.
文摘The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to define the ultimate survey units. All of them use the face-to-face interview as the survey procedure. The hypothesis to be tested is that it is possible to achieve the same degree of representativeness using a combination of random route sampling and quota sampling (with substitution) as it can be achieved by means of household sampling (without substitution) based on the municipal register of inhabitants. We have found such marked differences in the age and gender distribution of the probability sampling, where the deviations exceed 6%. A different picture emerges when it comes to comparing the employment variables, where the quota sampling overestimates the economic activity rate (2.5%) and the unemployment rate (8%) and underestimates the employment rate (3.46%).
文摘If the population is rare and clustered,then simple random sampling gives a poor estimate of the population total.For such type of populations,adaptive cluster sampling is useful.But it loses control on the final sample size.Hence,the cost of sampling increases substantially.To overcome this problem,the surveyors often use auxiliary information which is easy to obtain and inexpensive.An attempt is made through the auxiliary information to control the final sample size.In this article,we have proposed two-stage negative adaptive cluster sampling design.It is a new design,which is a combination of two-stage sampling and negative adaptive cluster sampling designs.In this design,we consider an auxiliary variablewhich is highly negatively correlatedwith the variable of interest and auxiliary information is completely known.In the first stage of this design,an initial random sample is drawn by using the auxiliary information.Further,using Thompson’s(JAmStat Assoc 85:1050-1059,1990)adaptive procedure networks in the population are discovered.These networks serve as the primary-stage units(PSUs).In the second stage,random samples of unequal sizes are drawn from the PSUs to get the secondary-stage units(SSUs).The values of the auxiliary variable and the variable of interest are recorded for these SSUs.Regression estimator is proposed to estimate the population total of the variable of interest.A new estimator,Composite Horwitz-Thompson(CHT)-type estimator,is also proposed.It is based on only the information on the variable of interest.Variances of the above two estimators along with their unbiased estimators are derived.Using this proposed methodology,sample survey was conducted at Western Ghat of Maharashtra,India.The comparison of the performance of these estimators and methodology is presented and compared with other existing methods.The cost-benefit analysis is given.
文摘This work was carried out with the objective of proposing some changes in the Strand’s sampling method, in which the trees are selected in sampling units with probability proportional to its diameter for the calculation of the stand density and basal area, and proportional to its height for the calculation of volume per hectare. Data used to evaluate the efficiency of the sampling of Strand in clusters were collected in stands of Pinus elliottii Engelm, located in a National Forest, Rio Grande do Sul State, Brazil. In the course of this research work it was proposed to convert the sampling unit into a cluster, structurally more efficient to obtain consistent estimates of volume and of dominant heights, using volumetric equivalence, which results in a form factor equal to one for the final calculation of volume per hectare and an indirect method to obtain the average height of Lorey. The objectives of this study were achieved, because with this methodology it is not necessary to measure heights of trees in the sampling unit, except a dominant height by cluster to evaluate sites. The development of independent estimators for basal area and volume gave rise to the proposition of an estimator for average height of Lorey, but without measuring any tree height in the sampling. The proposed methodology is an attractive solution to reduce costs in forest inventories, with the ability to have greater accuracy and scope for information at the level of compartments, without increasing the cost of sampling in comparison to that performed with units of fixed area. The use of smaller permanent sampling units with higher intensity in the compartments before the final cut will substantially increase the precision of the estimators in these management units, which will enable them to eliminate the pre-cut inventory in forest enterprises.
基金supported by the Fundamental Research Funds for the Central Universities(No.2020JS005).
文摘Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.