摘要
为提高电子文本分类效果,解决独立同分布模型在标记数据不足时存在的参数估计问题,提出了一种基于Nesterov平滑的高阶路径朴素贝叶斯文本分类算法.首先,利用传统意义的朴素贝叶斯事件模型构建高阶路径形式的文本分类模型,利用高阶路径中的隐式链接信息来提高文本分类模型的性能;其次,针对朴素贝叶斯事件模型中采用拉普拉斯平滑的二阶差分过程容易产生信息丢失、噪声增强的问题,提出基于Nesterov平滑的高阶路径朴素贝叶斯文本分类改进算法;最后,通过基准数据集和图书馆电子文本分类实验,验证了所提算法的有效性.
In order to improve the classification effect of electronic text,and to solve the problem of parameter estimation in insufficient labeled data,a new method of text classification based on Nesterov smoothing has been proposed.Firstly,the text classification model based on the traditional meaning of naive Bayesian event model is constructed,which can improve the performance of text classification model with implicit link information in higher order path;Secondly,according to the naive Bayes model for events in the Laplacian smoothing of second order difference process tends to result in information loss and noise generated on the strengthening of the role of the problem,the Nesterov smooth high order path naive Bayes text classification algorithm has been put forward;Finally,the effectiveness of the proposed algorithm is verified by the benchmark data set and the electronic text classification experiment of the library.
作者
邓广彪
黄振功
岳晓光
DENG Guang-biao;HUANG Zheng-gong;YUE Xiao-guang(School of Mathematics and Computer Sciences,Guangxi Normal University for Nationalities,Chongzuo guangxi 532200,China;Department of Engineering Management,Wuhan University,Wuhan 430070,China)
出处
《西南师范大学学报(自然科学版)》
CAS
北大核心
2018年第7期107-112,共6页
Journal of Southwest China Normal University(Natural Science Edition)
基金
2015年度广西高校科学技术研究项目(KY2015LX539)