4Rohit Ananthakrishna,Surajit Chaudhuri,Venkatesh Ganti.Eliminating Fuzzy Duplicates in Data Warehouses.VLDB,2002:586-597.
5Luis Gravano,Panagiotis G Ipeirotis,H V Jagadish et al.Divesh Srivastava:Using q--grams in a DBMS for Approximate String Processing[J]. IEEE Data Eng Bull,2001 ;24(4) :28-34.
6Pdcardo A Baeza-Yates,Berthier A Ribeiro-Neto.Modem Information Retrieval[M].ACM Press/Addison-Wesley, 1999.
7Alvaro E Monge,Charles Elkan.An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. DMKD, 1997.
8M Hemandez,S Stolfo.Real-world data is dirty:Data cleansing and the merge/purge problem[J].Data Mining and Knowledge Discovery, 1997,2(1).
9Erhard Rahm, Hong Hai Do.Data Cleaning :Problems and Current Approaches[J].IEEE Data Eng Bull,2000;23(4):3-13.
10Mauricio A Hemández ,Salvatore J Stolfo.The Merge/Purge Problem for Large Databases[C].in :SIGMOD Conference, 1995 : 127-138.