期刊文献+

基于IRPU算法的专利数据相似重复属性及记录检测方法 被引量:2

The Method of Patent Data Approximately Duplicate Attributes and Records Detecting Based on IRPU Algorithm
原文传递
导出
摘要 面向专利数据领域,从专利文献自身的特点及专利分析需求出发,基于RFMA算法和PCM算法提出一种改进的专利数据相似重复属性及记录检测方法,即IRPU算法。将该算法应用到专利数据中,对发明人属性和整体记录进行检测。实验结果表明,该方法适用于专利数据领域,具有较高的识别精度。 Oriented to patent data fields,taking the characteristics of patent document and the requirement of patent analysis into account,this paper puts forward an improved method of patent data approximately duplicate attributes and records detecting based on RFMA algorithm and PCM algorithm,which is IRPU algorithm.Then IRPU algorithm is applied in patent data to detect inventor attribute and whole record.Experimental comparison with the previous work indicates that the proposed method is fit for patent data field and the identification accuracy is higher.
出处 《现代图书情报技术》 CSSCI 北大核心 2010年第12期46-51,共6页 New Technology of Library and Information Service
基金 中国博士后科学基金资助课题"面向战略性技术管理的专利分析体系研究"(项目编号:20100470389)的研究成果之一
关键词 数据清洗 相似重复记录 相似重复属性 位置编码 专利 Data cleaning Approximately duplicate records Approximately duplicate attributes Position coding Patent
  • 相关文献

参考文献14

  • 1Mange A. An Adaptive and Efficient Algorithm for Detecting Approximately Duplicate Database Records [ EB/OL]. ( 2007 - 09 - 02). [ 2010 - 11 - 01 ]. http ://citeseer. ist. psu. edu/mon- geovadaptive, html.
  • 2Monge A E, Elkan C P. An Efficient Domain - independent Algorithm for Detecting Approximately Duplicate Database Records [ C ]. In: Proceedings of the SIFMOD Workshop on Data Mining and Knowledge Discovery, Tuscan, Arizona, United States. 1997 : 23 - 29.
  • 3Foulonneau M. Information Redundancy Across Metadata Collections [ J ]. Information Processing and Management, 2007, 43 (3) :740 -751.
  • 4Liang J, Chen L, Mehrotra S. Efficient Record Linkage in Large Data Sets[ C ]. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications, Kyoto, Japan. 2003 : 137 - 148.
  • 5Chandhurt S, Ganjam K, Ganti V, et al. Robust and Efficient Fuzzy Match for Online Data Cleaning [ C ]. In : Proceedings of ACM SIGMOD International Conference Management of Data. New York : ACM Press ,2003:313 - 324.
  • 6Hernandez M A, Stolfo S J. The Merge/Purge Problem for Large Databases [ C ]. In: Proceedings of the ACM SIGMOD Internation- al Conference on Management of Data. 1995:127 -138.
  • 7王常武,韩菁华,张付志.一种相似重复元数据记录检测方法[J].计算机工程,2009,35(21):85-87. 被引量:3
  • 8时念云,张金明,褚希.基于CURE算法的相似重复记录检测[J].计算机工程,2009,35(5):56-58. 被引量:11
  • 9周丽娟,肖满生.基于数据分组匹配的相似重复记录检测[J].计算机工程,2010,36(12):104-106. 被引量:6
  • 10Monge A E, Elkan C P. The Field Matching Problem: Algorithms and Applications [ C ]. In : Proceedings of the 2nd International Conference on Knowledge Discovery and Databases. London: Springer Verlag, 1996:267 - 270.

二级参考文献40

  • 1王天江,刘芳,卢正鼎.基于聚类汇总的记录匹配算法[J].计算机工程与科学,2004,26(9):62-63. 被引量:2
  • 2陈细谦,迟忠先,昃宗亮,苏立强.地理编码在空间数据仓库ETL中的应用[J].小型微型计算机系统,2005,26(4):628-630. 被引量:11
  • 3倪维健,黄亚楼,李飞,刘赏.一种基于加权多代表点的层次聚类算法[J].计算机科学,2005,32(5):150-154. 被引量:5
  • 4郭俊,樊彦国.一种改进的CURE聚类算法[J].内蒙古石油化工,2005,31(8):12-15. 被引量:4
  • 5张永,迟忠先.位置编码在数据仓库ETL中的应用[J].计算机工程,2007,33(1):50-52. 被引量:12
  • 6Lee M L, Lu Hongjun, Ling T W, et al. Cleansing Data for Mining and Warehousing[C]//Proc. of the 10th Int'l Conf. on Database and Expert Systems Applications. Florence, Italy: [s. n.], 1999:751-760.
  • 7Liang Jin, Chen Li, Mehrotra S. Efficient Record Linkage in Large Data Sets[C]//Proc. of the 8th Int'l Conf. on Database Systems for Advanced Applications. Kyoto, Japan: [s. n.], 2003: 137-148.
  • 8Hernandez M, Stolfo S. The Merge/Purge Problem for Large Databases[M]. New York, USA: ACM Press, 1995:127-138.
  • 9Monge A E, Elkan C E An Efficient Domain-independent Algorithm for Detecting Approximately Duplicate Database Records[C]//Proc. of Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, Arizona, USA: [s. n.], 1997: 23-29.
  • 10Gravano L, Ipeirotis P G. Using Q-grams in a DBMS for Approximate String Processing[J]. IEEE Data Engineering Bulletin, 2001, 24(4): 28-34.

共引文献50

同被引文献24

引证文献2

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部