摘要
自动文本分类是提高信息利用效率和质量的有效方法。训练文本分布的不均匀会对分类的效果产生负面影响,而在实际中,很难使训练文本的分布达到均匀。针对这一问题,提出了一种改进的k-NN文本分类方法。通过在英文和中文两个文本集的实验表明,改进后的方法不仅分类的准确性有了提高,而且表现出较好的稳定性。
Automatic text categorization is an effective method to increase efficiency and quality of information utilization. The uneven distribution in training set will affect categorization result negatively, while it is uneasy to get even distribution in training set in reality. To this problem, we present an optimized k-NN method and verified its effectiveness by the experiments on both English and Chinese text sets. The classification performance is promoted by the improvement of precision and stability.
出处
《情报学报》
CSSCI
北大核心
2007年第1期56-59,共4页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金资助项目(No.70371004)
关键词
文本分类
信息检索
K-NN
算法
text classification, information retrieval, k-NN, algorithm