期刊文献+

优化的覆盖算法在文本挖掘中的应用研究 被引量:3

Study of the Optimized Covering Algorithm and Its Application in Text Mining
下载PDF
导出
摘要 文章首先分析了覆盖算法中存在的两个主要缺点,即由于分类边界的粗糙而造成的测试样本拒识的概率较大以及当所得的覆盖存在交叉时,测试样本的类别确定问题,在此基础上应用基于商空间的粒度计算理论针对覆盖算法中的第二个缺点进行优化,即对覆盖算法中的由于覆盖交叉而误判的样本进行二次识别。通过减小识别样本的粒度,使覆盖粒度在由粗到细的变化过程中,实现对误判样本的渐进识别,在更小的空间上实现对误判样本的二次识别,从而提高了识别率。最后在已进行过预处理的中文文本数据库中使用优化后的覆盖算法,实验结果表明,优化后的方法减少了误判样本的数量,降低了识别样本时的出错率,有效地提高了分类的精度。 The authors analysis two shortcomings of Covering Algorithm, that is, the high rate of refused samples because of the rough boundary of classification and the class which are in the cross of coverage belong to. Based on this, the author apply the granular computing theory based on quotient into the improvement and optimization of the second shortcoming of covering algorithm, that is, classify the misclassified samples because of the cross of coverage again. In the course of decreasing granular from big to small by using the different granular of classifying the samples, the authors classify the misclassified samples gradually and improve the classified correct rate by reduced the misclassified samples in the smaller granular. The authors apply the optimized Covering Algorithm in Chinese Text Database which has been cut into words. The computer experiments show that this method reduce the number of misclassified samples and enhance the accuracy of test samples by decreasing the error rate in the test.
作者 周瑛 牛浏
出处 《电脑知识与技术》 2014年第11X期8065-8069,共5页 Computer Knowledge and Technology
基金 教育部人文社科基金项目"基于粒度计算理论的文本挖掘技术的研究"(项目编号:11YJA870032)的中期研究成果之一
关键词 覆盖交叉 粒度计算理论 文本挖掘 cross of coverage granular computing theory text mining
  • 相关文献

参考文献4

二级参考文献51

  • 1钱铁云,王元珍,冯小年.结合类频率的关联中文文本分类[J].中文信息学报,2004,18(6):30-36. 被引量:12
  • 2李道国,苗夺谦,张东星,张红云.粒度计算研究综述[J].计算机科学,2005,32(9):1-12. 被引量:54
  • 3周瑛,刘政怡.覆盖算法在文本分类中的应用[J].情报理论与实践,2006,29(1):115-117. 被引量:7
  • 4周瑛,张铃.基于概率的覆盖算法的研究[J].计算机技术与发展,2006,16(3):29-30. 被引量:3
  • 5Zhang Ling.A Geometrical Representation of McCullochPitts Neural Model and Its Applications[J].IEEE Trans,on Neural Networks,1999,10(4):925-929.
  • 6Dempster A P,Laird N M,Rubin D B.Maximum likelibood from incomplete data using the EM algorithm(with discus-sion)[J].J.R.Stat.Soc.Sex.B,1977,39:1-38.
  • 7Yao Y Y. Granular computing: basic issues and possible solutions. In: Proc. of the 5th Joint Conf. on Information Sciences,Volume Ⅰ, Atlantic City, New Jersey, USA, February 27-March 3, 2000, P.P. Wang Ed. , Association for Intelligent Machinery, 2000. 186~189
  • 8Zadeh L A. Fuzzy Logic = Computing with Words. IEEE Trans. on Fuzzy Systems, 1996,2: 103~ 111
  • 9Thiele H. On Semantic Models for Investigating Computing with words. In: Second Intl. Conf. on Knowledge Based Intelligent Electronic Systems, Adelaide, No. CL-32-98. 1998
  • 10Pawlak Z. Rough sets. Intl. Journal of Computer and InformationSeience, 1982,11 : 341~356

共引文献59

同被引文献23

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部