期刊文献+

基于模糊综合评判的相似重复记录清洗方法 被引量:3

Cleaning method of approximately duplicate records based on fuzzy comprehensive evaluation
下载PDF
导出
摘要 相似重复记录清洗对于提高数据仓库的数据质量有极其重要的意义,字段匹配算法是最常用的检测算法之一。针对该算法中属性权值确定主观性过强的问题,提出基于多用户模糊综合评判确定属性等级并根据用户评价结果计算属性权值的方法。在此基础上,进一步将属性切分为原子,通过计算原子相似度进而计算属性相似度,最后进行记录判重。实验结果表明该方法能较客观地反映属性的重要程度,通过切分属性为原子并判重也进一步提高了检测的精度。 Cleaning approximately duplicate records is very important for improving the data quality of the data warehouse and field matching algorithm is widely used in detecting records. Aiming at the problem of excessive subjectivity of attribute weights in detection algorithm,a method based on the fuzzy comprehensive evaluation of multiuser is proposed to determine the attribute level,and then the attribute weights are calculated according to the results of the users' evaluation. On this basis,the attribute is further divided into atoms,and the attribute similarity is calculated by calculating the similarity of atoms.Finally,the repeatability of the records is judged. The experiment results show that the method can reflect the importance of attributes more objectively; it can also improve the accuracy of detection by dividing the attributes into atoms and determining the repetition.
出处 《北京信息科技大学学报(自然科学版)》 2017年第4期59-63,共5页 Journal of Beijing Information Science and Technology University
基金 福建省自然科学基金项目(2015J01653) 福建江夏学院青年科研人才培育基金项目(JXZ2014011)
关键词 相似重复记录 属性 模糊综合评判 算法 approximately duplicated records attribute fuzzy comprehensive evaluation algorithm
  • 相关文献

参考文献11

二级参考文献181

共引文献79

同被引文献21

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部