摘要
概念漂移探测是数据流挖掘的一个研究重点,不确定性分析是粗糙集理论的研究核心之一.大数据、数据流中存在不确定变化和概念漂移现象,但是,除F-粗糙集外,几乎所有的粗糙集模型都是静态模型或半动态模型,专注于各种不确定性研究,难以处理不确定性变化,也难以探测概念漂移.结合量子计算、数据流、概念漂移和粗糙集、F-粗糙集的基本观点,以上、下近似为工具,定义了知识系统中的全粒度粗糙集和上、下近似概念漂移,上、下近似概念耦合等概念,探讨了全粒度粗糙集的性质,分析了知识系统内概念的全局变化.全粒度粗糙集继承了Pawlak粗糙集和F-粗糙集的基本思想,以上、下近似簇为工具表示了概念在知识系统内的各种可能变化.用嵌套哈斯图表示了概念不同情况下的同一性和差异性:同一层内的表示没有发生概念漂移,不同层内的表示发生了概念漂移.以正区域为工具,定义了决策表中的全粒度正区域和概念漂移、概念耦合等概念,探究了全粒度正区域的性质,分析了决策表内整体概念的全局变化.全粒度正区域表示了决策表中各种可能情况下的正区域,用嵌套哈斯图表示了正区域簇的同一性和差异性:同一层内没有发生相对于正区域的概念漂移,不同层内发生了相对于正区域的概念漂移.在全粒度粗糙集意义下,定义了全粒度绝对约简、全粒度值约简、全粒度Pawlak约简等属性约简,并探讨其性质.与大部分的属性约简不同(仅仅与并行约简和多粒度约简类似),全粒度属性约简要求概念的所有可能表示不发生概念漂移.进一步探讨了属性约简的优缺点,属性约简使得概念的表示变得单一,冗余属性的存在增加了概念表示的丰富性、多样性.在认识论方面,以粗糙集和粒计算为工具分析了人类认识世界的局部性与全局性,对人类认识世界的方式进行了进一步探讨.全粒度粗糙集在一定意义下能够表示人类认识的复杂性、不确定性、多样性、层次性和动态性,在量子计算的帮助下能够从一个粒度转跳到另一个粒度并且毫无困难.全粒度粗糙集的研究及其中的概念漂移探测为各种条件下的概念漂移探测和人类智能的模拟提供了有益的启示.
Concept drifting detection is one of the hot topics in data stream mining,and analysis of uncertainty is dominant in rough set theory.There exist the change of uncertainty and concept drifting in big data and data stream.However,except for F-rough sets,almost all of rough set models are static models or semi-dynamic models,which study on vagueness and uncertainty.It is hard for them to deal with the change of uncertainty,and to detect concept drifting.Combined with the ideas of quantum computing,data stream,concept drifting,rough sets and F-rough sets,a rough set model for entire granulations(called entire-granulation rough sets)is presented,and a lot of concepts,such as concept drifting of upper approximation,concept drifting of lower approximation,coupling of upper approximation and coupling of lower approximation,etc.are defined.The properties of entire-granulation rough sets are investigated,and the change of uncertainty for a concept in a knowledge system is analyzed with these definitions.Entire-granulation rough sets inherit the basic ideas of Pawlak rough sets and F-rough sets,which describe all of the changes of uncertainty for a concept with a family of upper approximations and lower approximations.Embedded Hasse diagram is employed to express the identity and diversity for a concept in different cases:There exists no concept drifting for the same level of concept expressions but exists concept drifting for the different levels of concept expressions.With the positive region,the positive region for entire granulations is defined,and concept drifting,concept coupling are defined in a decision system.The properties of entire-granulation positive region are discussed,and the analysis and measurement for the change of concept uncertainty are conducted.Entire-granulation positive region expresses all of the positive regions in various cases in a decision system.Embedded Hasse diagram is also employed to express the identity and diversity for the family of positive regions:There exists no concept drifting relative to positive region for the same level of concepts,but exists concept drifting relative to positive region for different levels of concepts.In entire-granulation rough sets,entire-granulation absolute reducts,entire-granulation value reducts and entire-granulation Pawlak reducts are defined,and their properties are investigated.Not like most types of attribute reducts(just like parallel reducts and mutil-granulation conditional attribute reducts),entire-granulation conditional attribute reducts ask for no concept drifting for all of concept expressions.The advantages and faults of conditional attribute reduction are further investigated:The unicity of concept expressions is done when condition attribute reduct is conducted,while the redundant conditional attributes can make concept expression more diversified.From the viewpoints of epistemology,the wholeness and locality of human thinking are further analyzed with granular computing and rough sets.To some extent,entire-granulation rough sets can express complexity,uncertainty,diversity,hierarchy and dynamic in the process of human cognition.With the help of quantum computing,the model of entire-granulation rough sets can transform one type of granulation to another fluently.The study on entire-granulation rough sets and concept drifting detection among them can provide heuristic information for various concept drifting detection and simulation of human intelligence.
作者
邓大勇
卢克文
苗夺谦
黄厚宽
DENG Da-Yong;LU Ke -Wen;MIAO Duo-Qian;HUANG Hou-KuanCollege of Mathematics(College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua, Zhejiang 321004;Xingzhi College, Zhejiang Normal University, Jinhua, Zhejiang 321004;School of Electronics and Information Engineering, Tongji University, Shanghai 201804;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第1期85-97,共13页
Chinese Journal of Computers
基金
国家自然科学基金项目(61473030
61572442
61203247
61273304
61573259
61472166)
浙江省自然科学基金项目(LY15F020012)资助
关键词
全粒度粗糙集
概念漂移
偏序关系
概念耦合
上、下近似
entire-granulation rough sets
concept drifting
partial ordering relation
concept coupling
upper and lower approximati