摘要
数据聚类的可视分析方法利用可视化与交互技术帮助用户对聚类过程与结果进行多角度分析,从而发现数据内部隐藏的结构和关系。但由于高维数据自身的“维度诅咒”问题使得聚类分析面临着许多挑战,例如模型参数设定、数据特征捕捉、结果解释以及可视化展现等。本文从高维数据聚类过程中遇到的问题出发,首先总结了高维数据聚类过程中常用的数据处理方法并对其性能进行了比较,这些方法能够较好地解决“维度诅咒”问题,帮助用户挖掘数据中存在的聚类模式。在分析和理解不同聚类结果中包含的数据内部结构和规律时,由于前期采取的数据处理方法不同,因此需要采取不同的探索分析策略,所以本文将近10年来高维数据聚类的可视分析方法分为2大类进行总结,即基于降维的聚类可视分析方法和基于子空间聚类的可视分析方法。最后对该领域目前存在的机遇与挑战进行了讨论。
Visual clustering analysis makes use of visualization and interaction technologies to help users analyze the clustering process and results from multiple perspectives to find hidden structures and relationships within the original data.However,because of the“curse of dimension”of high-dimensional data,there are many challenges posed for cluster analysis,such as parameter setting of clustering model,data feature capture,result interpretation and visualization.Starting with the problems encountered in the process of high-dimensional data clustering,this paper firstly summarizes the data processing methods commonly used in the process of clustering and compares their performance.These methods can greatly solve the“curse of dimension”problem to help users explore the clustering patterns existing in the data.Then,due to the different needs of the clustering results obtained by different data processing methods in analyzing and understanding the internal structure and rules hidden in clusters,this paper makes a summary and divides the currently available visual analysis approaches of clustering high-dimensional data into two categories,namely,visual analysis approaches based on dimensionality reduction and subspace clustering.Finally,the current opportunities and challenges existing in this field are discussed.
作者
章蓉
陈谊
张梦录
孟可欣
ZHANG Rong;CHEN Yi;ZHANG Meng-lu;MENG Ke-xin(Beijing Key Laboratory of Big Data Technology for Food Safety,Beijing Technology and Business University,Beijing 100048,China;School of Information Engineering,Wuhan University of Technology,Wuhan Hubei 430070,China)
出处
《图学学报》
CSCD
北大核心
2020年第1期44-56,共13页
Journal of Graphics
基金
国家重点研发计划资助项目(2018YFC1603602)
国家自然科学基金项目(61972010)
国家科技基础性工作专项(2015FY111200)
关键词
可视分析
聚类
高维数据
综述
visual analysis
clustering
high-dimensional data
overviewing