期刊文献+

Web文本分类技术研究现状述评 被引量:7

A Survey on Web Text Classification
下载PDF
导出
摘要 本文在分析国内外Web文本分类方法研究现状的基础上,对新近出现的基于群的分类方法、基于模糊—粗糙集的文本分类模型、多分类器融合的方法、基于RBF网络的文本分类模型、潜在语义分类模型等新方法,以及K—近邻算法和支持向量机的新发展等进行了深入探讨;并对Web文本分类过程的几个关键技术:文本预处理、文本表示、特征降维、训练方法和分类算法进行了分析;最后总结了Web文本分类技术存在着新分类方法不断涌现、传统分类方法的进一步发展、文本、语音和图像分类技术的融合等几种发展趋势,以及存在着分词问题、目前还没有发现"最佳"的特征选择等研究的不足之处。 This article has analyzed the research present situation of domestic and foreign Web text classification method firstly, has analyzed the new methods which recently appeared, swarmbased approaches, based on the fuzzy - rough collection text classification model, the multi- sorters fusion method, based on RBF network text classification model, latent semantic classification model and so on, as well as the recent development of the K-NN and the support vector machine (SVM) method; And has discussed the Web text classification process several essential technologies: The text pretreatment, the text expressed, the characteristic fell Uygur, the training method and the classified algorithm; Finally summarized the development tendency and deficiency of Web text classification technology.
作者 高淑琴
出处 《图书情报知识》 CSSCI 北大核心 2008年第3期81-86,共6页 Documentation,Information & Knowledge
关键词 WEB文本分类 数据挖掘 机器学习 Web text classification Data mining Machine learning
  • 相关文献

参考文献42

  • 1王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275
  • 2王本年,高阳,陈世福,谢俊元.Web智能研究现状与发展趋势[J].计算机研究与发展,2005,42(5):721-727. 被引量:23
  • 3侯汉清.分类法的发展趋势简论.北京:中国人民大学出版社,1981.
  • 4李晓黎,刘继敏,史忠植.概念推理网及其在文本分类中的应用[J].计算机研究与发展,2000,37(9):1032-1038. 被引量:57
  • 5黄萱菁,吴立德.独立于语种的文本分类方法.2000 International Conference on Multilingual Information Processing, 2000 : 37-43.
  • 6Parpinelli R S, Lopes H S, Freitas A A. Data Mining with an Ant Colony Optimization Algorithm. IEEE Trans. on Evolutionary Computation, 2002, special issue on Ant Colony algorithms.
  • 7Parpinelli R S, Lopes H S, Freitas A A. Mining Comprehensible Rules from Data with an Ant Colony Algorithm. In: Bitten court G, Ramalho,. eds. SBIA 2002, LNAI2507, 2002. 259-269.
  • 8付雪峰,王明文.基于模糊-粗糙集的文本分类方法[J].华南理工大学学报(自然科学版),2004,32(z1):73-76. 被引量:8
  • 9Yao,Y Y. A Comparative Study of Fuzzy Sets and Rough Sets. Information Sciences, 1998,109 (1-4) : 227-242.
  • 10Dubois D, Prade H. Putting Rough Sets and Fuzzy Sets Together. Intelligent Decision Support: Handbook of Applications and Advanced of the Rough Set Theory. Boston: Slowinski RED, Kluwer Academic Publishers, 1992,203-222.

二级参考文献80

  • 1曾黄麟.粗集理论及其应用--关于数据推理的新方法[M].重庆:重庆大学出版社,1998..
  • 2[1]Dubois D,Prade H. Putting rough sets and fuzzy sets together [A]. Intelligent Decision Support: Handbook of Applications and Advanced of the Rough Set Theory [C].Boston: Slowinski R ED, Kluwer Academic Publishers, 1992. 203 - 222.
  • 3[2]Yao Y Y. A comparative study of fuzzy sets and rough sets [J]. Information Sciences, 1998,109 (1-4): 227 -242.
  • 4[4]Keller J M, Gray M R, Givens J A. A fuzzy k-nearest neighbor algorithm [J]. IEEE Transactions on System Man and Cybernetics, 1985,15 (4) :580 - 585.
  • 5[5]Yang Y,Pederen J P. A comparative study on feature selection in text categorization [A]. Proceeding of the Fourteenth International Conference on Machine Learning (ICML97) [C]. Nashville Tennessee USA :Morgan Kaufmann, 1997.412 - 420.
  • 6[7]Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory [J]. IEEE Transactions on System Man and Cybernetics, 1995,25(5):804 -813.
  • 7[8]Francois J, Grandvalet Y, Denoeux T, et al. Resample and combine:An approach to improving uncertainty representation in evidential pattern classification [J]. Information Fusion,2003 (4) :75 -85.
  • 8[1]Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Survey, 2002,34 (1):1 -47.
  • 9[2]Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990,41 (6) :391 - 407.
  • 10[3]Dumais S T. Using LSI for information filtering [A].Harman D. The Third Text Retrieval Conference ( TREC - 3) [C]. USA: National Institute of Standards and Technology Special Publication, 1995.

共引文献446

同被引文献60

引证文献7

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部