摘要
提出了一种基于后向传播神经网络的专利自动分类方法。通过中文分词从专利文件集中提取特征项,并根据特征项在专利文件中出现的频率赋予其权重,从而将每篇专利文件表示为一个特征项向量。为取得较好的BP神经网络(BPN)训练效果,使用X2统计方法进行特征向量降维,并使用BPN专利分类器进行专利文件分类。用国际分类号为H02下的专利文件作为测试数据,取得了较好的分类效果。
A patent categorization method based on back-propagation network is proposed.First,extracting feature phrases frompatent document set by Chinese word segmentation and determining the weight of feature phrases according to their frequency in patent text. Then,each patent is presented by a feature phrases vector.In order to get a better performance of back-propagation network(BPN) training,X2 statistical methods is applied to eigenvector dimension reduction.Then,patents are automatically classified using pretrained BPN models.Patents whose IPC is H02 are used as test files.The result shows the system has a good classification accuracy and efficiency.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第23期5075-5078,共4页
Computer Engineering and Design
基金
国家"十一五"科技支撑计划基金项目(2006BAH03B03)
中央高校基本科研业务费专项基金项目(YX2010-30)
中国科学技术信息研究所重点工作基金项目(200KP01-3-1)