摘要
文本标签作为一种文本关键词,能够简化科技政策中有效信息的挖掘。本文从科技政策类别角度,将标签类别分为科技投入、知识产权、农村科技和税收四类,针对传统SVM算法的缺点和标签数据不平衡的缺点,结合欧式距离思想,提出一种带有惩罚因子的ESVM科技政策文本标签分类方法。最后,对比SVM和ESVM两种分类方法,验证了本文方法在处理科技政策文本标签数据上的有效性。
Text label is a kind of text keywords,can simplify extraction of effective information from science and technology policy.For science and technology policy,this paper divides text label into four kinds,such as science and technology investment,intellectual property rights,rural science and technology,tax.Aimed at the shortcoming of the traditional SVM algorithm’s label data unbalance,this paper provides a text label classification method of science and technology policy,which combines the Euclidean distance algorithm and ESVM algorithm with penalty factor.Finally,with comparing SVM and ESVM,the validity of the propose method on science and technology policy text label is verified.
作者
吴峰
李银生
聂永川
范通让
赵文彬
张博
WU Feng;LI Yin-sheng;NIE Yong-chuan;FAN Tong-rang;ZHAO Wen-bin;ZHANG Bo(Institute of Scientific and Technical Information of Hebei Province,Shijiazhuang Hebei 050021,China;School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang Hebei 050043,China)
出处
《河北省科学院学报》
CAS
2018年第1期1-10,共10页
Journal of The Hebei Academy of Sciences
基金
国家自然科学基金(#61373160)
河北省科技厅科技支撑计划项目(17210113D),(179676334D)
关键词
文本标签分类
科技政策
SVM
不平衡数据
Text label classification
Science and technology policy
SVM
Unbalanced data