摘要
文本分类是文本挖掘的一个重要内容,在很多领域都有广泛的应用。为了实现中文文本分类问题,先采用分词技术和TF-IDF算法得到每一篇中文文档的特征向量,然后采用PB神经网络构造一个中文文本分类器。实验证明,采用BP神经网络进行中文文本分类时,虽然存在学习周期长,收敛速度慢等问题,但其分类速度和分类的正确率还是很高的。因此,采用BP神经网络进行中文分类是一个比较好的方法。
Text classification is an important part of text mining, and it has been widely used in many fields. In order to realize the Chinese text classification, the feature vector of each document is obtained by using the word segmentation technique and TF-IDF algorithm, and then a Chinese text classifier is constructed by BP neural network. Experiment results show that using BP neural network to Chinese text categorization, although there are problems such as a long learning period, slow convergence and so on, the classification speed and classification accuracy rate is quite high. Therefore, using BP neural network to classify Chinese is a good way.
出处
《计算机时代》
2015年第11期58-61,共4页
Computer Era
关键词
中文文本分类
BP神经网络
中文分词
文档特征向量
Chinese text classification
BP neural network
Chinese word segmentation
document feature vector