摘要
由于客观世界的复杂性,信息缺失、不确定是普遍存在的。数据库作为表达现实世界的一种工具,使用空值来表达信息缺失的现象。针对关系数据库中的空值问题,提出一种基于多表关联的多空值估计方法。该方法首先以尽可能少地引入误差的原则确定估计每一列空值的顺序;然后对每一列空值先采用本表的信息进行估计,当预测误差大于给定阈值时,根据该表与其他表的关系形式选择不同的模式引入多表信息来提高预测的准确度。实验结果表明该方法估计空值的效果与其他方法相比有较高的准确率。
Missing information,indefinite information as well as scarcity of information truly exist due to the complexity of the real world. Relational database,as an important tool to express the real world,uses null value to express the missing information. Focusing on estimation of null values in relational databases,the paper proposes a multi- null value estimation method based on multi- table relationship information. First,it arranges the sequence of estimating null values of each attribute based on the principle of minimizing the bias that is brought in. Then it estimates null values of each attribute based on the information of the basic table. After that it brings in multi- table relationship information when the forecast error exceeds a threshold value. The schema to be brought in information depends on the relationship between the basic table and other related tables. In that case the proposed method can improve the accuracy of forecasting null values. The experiment results show that the proposed method is of relatively high accuracy.
出处
《计算机与现代化》
2016年第6期117-122,共6页
Computer and Modernization
基金
南京航空航天大学研究生创新基地(实验室)开放基金资助项目(kfjj201460)
关键词
关系数据库
空值
模糊聚类
回归系数
relational database
null value
fuzzy clustering
regression coefficient