摘要
基于向量空间模型的文本分类方法的文本表示具有高纬度、高稀疏的特点,特征表达能力较弱,且特征工程依赖人工提取,成本较高。针对该问题,提出基于双通道词向量的卷积胶囊网络文本分类算法。将Word2Vec训练的词向量与基于特定文本分类任务扩展的语境词向量作为神经网络的2个输入通道,并采用具有动态路由机制的卷积胶囊网络模型进行文本分类。在多个英文数据集上的实验结果表明,双通道的词向量训练方式优于单通道策略,与LSTM、RAE、MV-RNN等算法相比,该算法具有较高的文本分类准确率。
Text classification method based on space vector model has high latitude and sparse features in text expression,which leads to poor performance in feature description,and feature engineering relies on manual extraction,the cost of which is high.To address these problems,this paper proposes a text classification algorithm using convolutional capsule network based on dual-channel word vectors.This algorithm uses word vectors trained by Word2Vec and context vectors extended based on specific text classification tasks as two input channels of the neural network.Then a convolutional capsule network model with dynamic routing mechanism is used for text classification.Experimental results on multiple English datasets show that the dual-channel training method for word vectors has better performance than the single-channel training method.Also,the proposed algorithm has a higher accuracy rate in text classification compared with LSTM,RAE,MV-RNN and other algorithms.
作者
康雁
李晋源
杨其越
崔国荣
王沛尧
KANG Yan;LI Jinyuan;YANG Qiyue;CUI Guorong;WANG Peiyao(School of Software,Yunnan University,Kunming 650500,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第11期177-182,共6页
Computer Engineering
基金
国家自然科学基金(61762092)
云南省软件工程重点实验室开放基金(2017SE204)
关键词
双通道词向量
卷积胶囊网络
动态路由机制
文本分类
特征表达
dual-channel word vectors
convolutional capsule network
dynamic routing mechanism
text classification
feature description