摘要
提出一种基于关键词学习的文本分类方法.采用LDA主题模型抽取文本的关键词,通过关键词的词袋构造文本的特征矩阵并进行PCA降维,将低阶特征矩阵输入由卷积神经网络和BP神经网络的混合网络中对文本分类进行学习.为提高文本分类效果,引入与BP神经网络同构的深度神经网络对BP神经网络的初始权值进行初始化.在多数据集上的实验表明,本文方法明显提高文本分类的准确率.
A novel text classification method based on sample keywords is proposed in this paper. Firstly, we extract the text keywords using LDA theme model, and construct feature matrix of text using PCA method to reduce the dimension. Then we can learn the text classification via inputting the low-order feature matrix into the mixed network consisting of convolutional neural network and BP neural network. To improve the performance of text classification, we introduce deep neural network that has the same structure with BP neural network to initialize the original weight of BP neural network. Extensive experiments on different data sets have proved that the method proposed in this paper improves accuracy obviously.
作者
王天时
张龙
刘怀泉
刘丽
陈思琦
Wang Tianshi;Zhang Long;Liu Huaiquan;Liu Li;Chen Siqi(School of Information Science and Engineering,Shandong Normal University,250358,Jinan,China;Shandong Chaoyue Digital Control Electronics Co.Ltd,250013,Jinan,China)
出处
《山东师范大学学报(自然科学版)》
CAS
2019年第1期54-60,共7页
Journal of Shandong Normal University(Natural Science)
基金
国家自然科学基金资助项目(61702310)
关键词
抽取
词袋
特征矩阵
卷积神经网络
BP神经网络
extraction
bag of word
feature matrix
convolution neural network
BP neural network