摘要
为保护专利,提升专利申请者的申请成功率,提出基于改进向量空间模型的相似专利检测技术。改进向量空间模型引入了循环神经网络,通过循环神经网络处理文本序列来获得考虑词语在文档中顺序及上下文信息的词语。采用全球唯一标识符对专利文本进行预处理,通过中文分词系统来将汉语文本划分为有意义的词语。采用改进的向量空间模型来衡量专利文档相似度,并对句子相似度进行识别,达到相似专利检测的目的。将提出的改进向量空间模型应用于实际的专利检索中,并和传统向量空间模型进行对比。结果表明,改进的向量空间模型DCG值与准确率均高于传统向量空间模型。
In order to protect patents and improve the application success rate of patent applicants,a similar patent detection technology based on improved vector space model was proposed.The improved vector space model introduced the recurrent neural network,which processed the text sequence to obtain the words considering the order and context information of the words in the document.The global unique identifier was used to preprocess the patent text,and the Chinese word segmentation system was used to divide the Chinese text into meaningful words.The improved vector space model was used to measure the similarity of patent documents and identify the similarity of sentences,so as to achieve the purpose of similar patent detection.The improved vector space model was applied to patent search and compared with the traditional vector space model.The results showed that the DCG value and accuracy of the improved vector space model were higher than that of the traditional vector space model.
作者
夏清洁
攸彩红
赵英杰
XIA Qingjie;YOU Caihong;ZHAO Yingjie(CRRC Qingdao Sifang Rolling Stock Co.,Ltd.,Qingdao 266111,Shandong China;Baoding Dawei Computer Software Development Co.,Ltd.,Baoding 071000,Hebei China)
出处
《粘接》
CAS
2024年第11期193-196,共4页
Adhesion
关键词
向量空间模型
循环神经网络
相似专利检测
vector space model
recurrent neural network
similar patent detection