摘要
本文针对VSM (向量空间模型)中KNN (K最近邻算法)在文本处理环境下的不足,根据SOM (自组织映射神经网络)理论、特征选取和模式聚合理论,提出了一种改进的KNN文本分类方法。应用特征选取和模式聚合理论以降低特征空间维数。传统的VSM模型各维相同的权重并不适应于文本处理的环境,本文提出应用SOM神经网络进行VSM模型各维权重的计算。结合两种改进,有效地降低了向量空间的维数,提高了文本分类的精度和速度。
In view of the inadequacy of K nearest neighborhood (KNN) algorithm in text-processing environment in vector space models,this paper puts forward an improved KNN method of text categorization in accordance with self-organization mapping neutral network theory(SOM),feature selection theory and pattern aggregation theory.This paper employs feature selection theory and pattern aggregation theory to reduce feature space dimension.And because each dimension of VSM models possesses the same weight,which is not suitable for text-processing environment,this paper suggests applying SOM neutral network to calculate the weight of each dimension of VSM models.Combining the two improvements,this paper efficiently reduces the dimensions of vector space and raises accuracy and speed of text categorization.
出处
《情报科学》
CSSCI
北大核心
2005年第4期550-554,共5页
Information Science
基金
国家自然科学基金资助项目 (6 0 2 75 0 2 0 )