期刊文献+

采用内容挖掘的缅甸文字相似文档检索 被引量:2

Retrieval of the Most Similar Myanmar Document Using Content Mining
下载PDF
导出
摘要 对缅甸文字进行文本挖掘,提出一种新的缅甸文字单字分割算法和词干提取算法.在向量空间模型下,使用Okapi相似度评测方法,评测缅甸文字文档与查询关键字之间的相关性,实现一个基于以上算法的缅甸文字文献检索系统.实验结果证明:所提出来的算法能够快速、有效地挖掘Web上的Html文档. Propose a new algorithm of Myanmar word segmentation and Myanmar word stemming for Myanmar text mining. Under the vector space model, Okapi similarity method is used to evaluate the relationship between Myanmar text and the key words, consequently realize a Myanmar text retrieval system based on the algorithm above. The experimental results show that proposed by the algorithm can quickly and effectively mining the Html documents on the web.
出处 《华侨大学学报(自然科学版)》 CAS 北大核心 2013年第5期521-524,共4页 Journal of Huaqiao University(Natural Science)
基金 中央高校基本科研业务费专项基金资助项目 国务院侨办科研基金资助项目(09QZR02)
关键词 缅甸文字 文本挖掘 向量空间模型 信息检索 Okapi相似度 Myanmar word text mining vector space model text retrieval Okapi similarity
  • 相关文献

参考文献8

二级参考文献29

  • 1顾益军,樊孝忠,王建华,汪涛,黄维金.中文停用词表的自动选取[J].北京理工大学学报,2005,25(4):337-340. 被引量:35
  • 2包金龙.基于向量空间模型的信息检索系统的设计[J].情报杂志,2005,24(7):44-45. 被引量:16
  • 3韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 4Hu MS,Jia ZJ.Web Text Categorization on GBODSS[A].In Proceedings of 2009 4th International Conference on Computer Science & Education,2009:599-603.
  • 5Chen JN,Huang HK,Tian SF et al.Feature Selection for Text Classification with Naive Bayes[J].Expert Systems With Applications,2009,36 (3):5432-5435.
  • 6Koller D,Sahami M.Hierarchically Classifying Documents Using Very Few Words[A].In the 14th International Conference on Machine Learning,1997:170-178.
  • 7McCallum A,Nigam K.A Comparison of Event Models for Naive Bayes Text Classification[A].In AAAI-98 Workshop on Learning for Text Categorization,1998.
  • 8Thorsten Joachims.Text Categorization with Support Vector:Machine,Learning With Many Relevant Features[A].In European Conference on Machine Learning,Berlin:Springer,1998:137-142.
  • 9Y Yang.An Evaluation of Statistical Approaches to Text Categorization[J].Journal of Information Retrieval,1999,1(1/2):67 -88.
  • 10Wai Lam,Chao Yang Ho.Using a Generalized Instance Set for Automatic Text Categorization[A].Proceedings of the 21st Annual Internatioual ACM SIGIR Conference on Research and Development in Information Retrieval,1998:81-89.

共引文献33

同被引文献10

  • 1DAVIES S, MOORE A. Bayesian networks for lossless dataset compression[C]//Proceeding of International Con- ference Knowledge Discovery and Data Mining. San Diego: ACM Press, 2013:387-391.
  • 2MERETAKIS D, WUTHRICH B. Extending naive bayes classifiers using long item sets[C]//Proceeding of Interna- tional Conference Knowledge Discovery and Data Mining. San Diego:ACM Press, 2013:165-174.
  • 3ESPOSITO F, MALERBA D, SEMERARO G, et al. A comparative analysis of methods for pruning decision trees [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014,19 (5): 476-491.
  • 4LAM S L Y, LEE D L. Feature reduction for neural network based text categorization[C]//Digital Symposium Col- lection of 6th International Conference on Database System for Advanced Application. ES. 1. -: IEEE Press, 2015: 1121-1130.
  • 5CESTNIK B, BRATKO I. On estimating probabilities in tree pruning, machine learning.. EWSL-91 [C]//Kodratoff Lecture Notes in Artificial Intelligence. Berlin.. Springer, 2015 : 138-150.
  • 6ANDROUTSOPOULOS G, PALIOURAS V, KARKALETSIS G, et al. Learning to filter spare e-mail.. A compari- son of a naive Bayesian and a memory based approach[C] // Proceedings of 4th European Conference on Principles and Practice of Knowledge Discovery in Databases. London=Jerry Press, 2000: 1-13.
  • 7RASTOGI R, SHIM K. Public: A decision tree that integrates building and pruning[C]//Proceeding of 24th Inter- national Conference on Very Large Data Bases. New York: [s. n. ],2014:404-415.
  • 8喻小光,陈维斌,陈荣鑫.一种数据规约的近似挖掘方法的实现[J].华侨大学学报(自然科学版),2008,29(3):370-374. 被引量:6
  • 9孙丽华,张积东,李静梅.一种改进的kNN方法及其在文本分类中的应用[J].应用科技,2002,29(2):25-27. 被引量:36
  • 10刘昆.机器学习算法在文本信息挖掘中的应用[J].网络安全技术与应用,2016(11):77-77. 被引量:3

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部