摘要
知识约简是基于粗集理论进行数据挖掘的重要步骤,获取最优知识约简是典型的NP-hard问题,在实际应用中,数据属性往往具有成本约束,并且数据本身含有噪声,本文提出了将粗集理论与遗传算法相结合来求解这种信息表的最优知识约简和近似知识约简的方法。实验结果表明该方法具有很强的全局搜索能力,在有限的代数内找到信息表的最优约简:最小基约简集、最小成本约简集,当信息表含有噪声数据时,该方法能找出信息表的近似知识约简。
Knowledge reduction is an impotant step in data mining using rough set theory. Finding the optimal reduction is a typical NP-hard problem. A genetic algorithm based method is proposed in this paper for resolving the problem of optimal reduction of knowledge in an information table. The experiment results show that the GA method has strong ability in global optimization, and it can find the best reduction-under the criterion of the minimal cardinality or the lowest cost, of an information table within definite generations. It can also find the most approximate knowledge reduction when the information table includes data noise.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2001年第3期280-284,共5页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金
关键词
知识约简
成本约束
噪声数据
信息表
数据挖掘
数据库
Genetic Algorithm, Rough Sets, Optimal Knowledge Reduction, Approximate Knowledge Reduction