期刊文献+

基于特征选择的统计最优样本大小算法 被引量:3

Statistical optimal sample size algorithm based on feature selection
下载PDF
导出
摘要 针对统计最优样本大小算法在确定大数据集,尤其是高维数据集抽样样本大小时的执行效率较低,以及高维数据集中每一维属性的重要性不同且可能存在冗余属性,提出一种基于特征选择的统计最优样本大小算法。该算法基于熵理论,通过构造一个基于对象间相似度的熵度量方法来评估特征重要性,然后根据设计的一种挑选特征的标准获得重要的特征子集,最后在该特征子集上执行统计最优样本大小算法。实验结果表明,改进后算法得到的样本大小抽取的样本集能够在聚类算法中得到较高的准确率,同时也较明显地降低了算法的执行时间,从而验证了改进后的算法是有效可行的。 Aiming at the low execution efficiency in statistical optimal sample size algorithm to determine sample size for sampling large datasets,especially high-dimensional datasets and the importance of each dimension for high-dimensional datasets is different,moreover,there may be redundant attributes,this paper proposed statistical optimal sample size algorithm based on feature selection. The algorithm made use of the entropy theory. It constructed an entropy measure of similarity between objects to evaluate the importance of each dimension,then obtained important feature subsets according to design a kind of evaluation standard,finally executed statistical optimal sample size algorithm in the feature subsets. Experimental results show that the improved algorithm not only can receive higher accuracy in the clustering algorithm,but also can obviously reduce the execution time of the algorithm,so the improved algorithm is efficacious and feasible.
出处 《计算机应用研究》 CSCD 北大核心 2014年第12期3535-3538,3549,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61103129 61202312) 江苏省科技支撑计划资助项目(BE2009009)
关键词 统计最优样本大小算法 高维数据集 特征选择 聚类 statistical optimal sample size algorithm high-dimensional datasets feature selection entropy clustering
  • 相关文献

参考文献18

二级参考文献157

共引文献103

同被引文献43

  • 1蒋琰,茅宁.多元资本结构在中国企业的实证研究[J].中国工业经济,2007(1):78-85. 被引量:18
  • 2奚国泉,蔡军,钟甫宁.人力资本驱动的公司价值[J].人口与经济,2002(S1):113-115. 被引量:1
  • 3洪茹燕,吴晓波.国外企业智力资本研究述评[J].外国经济与管理,2005,27(10):42-48. 被引量:22
  • 4毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:95
  • 5Andrew Y N. Feature selection 11 vs. 12 regularization, and rotational invariance [ C ]//Proc of the 21st International Conference on Machine Learning. 2004 : 78- 85.
  • 6Jain A K, Duin R P W, Mao Jianchang. Statistical pattern recognition : a review[J]. IEEE Trans on Pattern Analysis and Machine Intel- ligence,2000,22( 1 ) :4-37.
  • 7Peng Hanchuan, Long Fuhui, Ding C. Feature selection based on mu- tual information : criteria of max-dependency, max-relevance and min- redundancy[ J]. IEEE Trans on Pattern Analysis and Machine In- telligence ,2005,2 (8) : 1226-1238.
  • 8Kononenko I. Estimating attributes:analysis and extension of RELIEF [C]//Proc of the 7th European Conference on Machine Learning. 1994 : 171-182.
  • 9Cpver T M,Thomas J A. Elements of information theory[ M]. 2nd ed. [ S. 1. ] : Wiley-Interscienee,2006.
  • 10Zhou Feng,Torredf. Canonical time warping for alignment of human behavior[ C ]//Advances in Neural Information Processing Systems. 2009 : 2286 - 2294.

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部