期刊文献+

A genetic algorithm based entity resolution approach with active learning 被引量:1

A genetic algorithm based entity resolution approach with active learning
原文传递
导出
摘要 Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing ap- proaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based en- tity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes' comparisons with proper thresholds. We use ac- tive learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity res- olution approaches in accuracy. Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing ap- proaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based en- tity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes' comparisons with proper thresholds. We use ac- tive learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity res- olution approaches in accuracy.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第1期147-159,共13页 中国计算机科学前沿(英文版)
基金 The authors thank anonymous reviewers for their in- spiting doubts and helpful suggestions during the reviewing process. This work was supported by the National Basic Research Program of China (973 Program) (2012CB316201), the Fundamental Research Funds for the Cen- tral Universities (N 120816001) and the National Natural Science Foundation of China (Grant Nos. 61472070, 61402213).
关键词 entity resolution genetic algorithm active learning data quality data integration entity resolution, genetic algorithm, active learning, data quality, data integration
  • 相关文献

参考文献2

二级参考文献60

  • 1Labrinidis A, Jagadish H. Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 2012, 5(12): 2032-2033.
  • 2Chang C, Kayed M, Girgis M R, ShaMan K F, others. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411-1428.
  • 3Lu J, Lu Y, Cong G. Reverse spatial and textual K nearest neighbor search. In: Proceedings of the 2011 International Conference on Man- agement of Data. 2011,349-360.
  • 4Simmhan Y L, Plale B, Gannon D. A survey of data provenance in e-science. ACM Sigmod Record, 2005, 34(3): 31-36.
  • 5He B, Patel M, Zhang Z, Chang K C C. Accessing the deep web. Com- munications of the ACM, 2007, 50(5): 94-101.
  • 6Lu J, SeneUart P, Lin C, Du X, Wang S, Chen X. Optimal top-k gener- ation of attribute combinations based on ranked lists. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 409-420.
  • 7Aggarwal C C, Wang H. Managing and mining graph data. Springer.Publishing Company, Incorporated, 2010.
  • 8Oceanbase. http://'oceanbase.taobao.org.
  • 9Sikka V, Farber F, Lehner W, Cha S K, Peh T, Bornh6vd C. Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 731-742.
  • 10Neo4j. http://neo4j.org.

共引文献24

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部