摘要
本文在分析国内外Web文本分类方法研究现状的基础上,对新近出现的基于群的分类方法、基于模糊—粗糙集的文本分类模型、多分类器融合的方法、基于RBF网络的文本分类模型、潜在语义分类模型等新方法,以及K—近邻算法和支持向量机的新发展等进行了深入探讨;并对Web文本分类过程的几个关键技术:文本预处理、文本表示、特征降维、训练方法和分类算法进行了分析;最后总结了Web文本分类技术存在着新分类方法不断涌现、传统分类方法的进一步发展、文本、语音和图像分类技术的融合等几种发展趋势,以及存在着分词问题、目前还没有发现"最佳"的特征选择等研究的不足之处。
This article has analyzed the research present situation of domestic and foreign Web text classification method firstly, has analyzed the new methods which recently appeared, swarmbased approaches, based on the fuzzy - rough collection text classification model, the multi- sorters fusion method, based on RBF network text classification model, latent semantic classification model and so on, as well as the recent development of the K-NN and the support vector machine (SVM) method; And has discussed the Web text classification process several essential technologies: The text pretreatment, the text expressed, the characteristic fell Uygur, the training method and the classified algorithm; Finally summarized the development tendency and deficiency of Web text classification technology.
出处
《图书情报知识》
CSSCI
北大核心
2008年第3期81-86,共6页
Documentation,Information & Knowledge