摘要
面向专利数据领域,从专利文献自身的特点及专利分析需求出发,基于RFMA算法和PCM算法提出一种改进的专利数据相似重复属性及记录检测方法,即IRPU算法。将该算法应用到专利数据中,对发明人属性和整体记录进行检测。实验结果表明,该方法适用于专利数据领域,具有较高的识别精度。
Oriented to patent data fields,taking the characteristics of patent document and the requirement of patent analysis into account,this paper puts forward an improved method of patent data approximately duplicate attributes and records detecting based on RFMA algorithm and PCM algorithm,which is IRPU algorithm.Then IRPU algorithm is applied in patent data to detect inventor attribute and whole record.Experimental comparison with the previous work indicates that the proposed method is fit for patent data field and the identification accuracy is higher.
出处
《现代图书情报技术》
CSSCI
北大核心
2010年第12期46-51,共6页
New Technology of Library and Information Service
基金
中国博士后科学基金资助课题"面向战略性技术管理的专利分析体系研究"(项目编号:20100470389)的研究成果之一
关键词
数据清洗
相似重复记录
相似重复属性
位置编码
专利
Data cleaning Approximately duplicate records Approximately duplicate attributes Position coding Patent