摘要
对于各个领域的信息资源管理而言,数据质量一直是一个非常关键的问题。现实世界中的数据往往存在着各种各样的问题,从简单的拼写错误到复杂的语义不一致错误。数据清洗的目标就是检测并去除数据中存在的各种错误和不一致,提高数据的质量。该文归纳、总结了数据清洗相关研究的现状,提出一个面向多数据源的数据清洗框架的定义。框架实现了术语模型、处理描述文件和共享库等概念和技术。
Data quality is crucial for information management systems in all domains. The real world is often dirty due to various data quality problems, which range from the simple data entry errors to the complex inconsistencies.Data cleaning deals with detecting and removing errors and inconsistencies from data to improve the quality of data.After providing a classification of data quality problems and a survey of data cleaning, this article presents a specification of an extensible data-cleaning framework.The framework realize features like term model,processing description file and rule&Dic base.
出处
《科技资讯》
2009年第1期13-15,共3页
Science & Technology Information