摘要
针对现代文化研究中对精确化文本情感分析的需求,文中对自然语言处理(NLP)中的词向量生成与训练方法进行了深入研究。通过采用层次Sotfmax结构,减轻了词向量描述时随着向量维度的增长而引发的矩阵稀疏问题。在该结构的输出层引入负抽样方法,节省训练时间;同时,使用深度卷积神经网络替代该模型中原有的二叉树结构,提升模型的泛化能力。使用SemEval2013进行模型仿真实验,结果表明,在进行英文文本的情感分析时,模型对于表示否定的情感倾向有较优的识别精度;而对于中性的文本识别精度较差。文中提出的CNN-Softmax模型由于引入更深层次的卷积结构,在性能上有了显著提升,Accuracy与F1分别达到了84.3%和82.3%,相较于传统基于二叉树的模型提高了约5%。
In response to the demand for accurate text sentiment analysis in modern cultural studies,the article conducts in-depth research on the word vector generation and training methods in Natural Language Processing(NLP).By adopting the hierarchical Sotfmax structure,the problem of matrix sparseness caused by the increase of the vector dimension in word vector description is alleviated.A negative sampling method is introduced in the output layer of this structure,which saves training time;at the same time,the original binary tree structure in the model is replaced by a deep convolutional neural network,which improves the generalization ability of the model.A model simulation experiment was carried out using SemEval2013,and the results showed that when performing emotional analysis of English text,the model has good recognition accuracy for expressing negative emotion tendencies,but has poor recognition accuracy for neutral texts.The CNN-Softmax model mentioned in the article has a significant improvement in performance due to the introduction of a deeper convolution structure.Accuracy and F1 have reached 84.3%and 82.3%,respectively,which is about 5%higher than the traditional binary treebased model.
作者
薛雨
XUE Yu(School of Humanities,Shangluo University,Shangluo 726000,China)
出处
《电子设计工程》
2021年第13期95-99,共5页
Electronic Design Engineering
基金
2020年陕西省高等教育学会(XGH20280)
2020年商洛学院教育教学改革研究重点项目(20jyjx102)
商洛学院商洛文化暨贾平凹研究中心开放课题资助项目(18SLWH06)。