摘要
提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性.
A semi-supervised K-means clustering algorithm for multi-type relational data is proposed, which extends traditional K-means clustering by new methods of selecting initial clusters and similarity measures, so that it can semi-supervise cluster multi-type relational data. In order to achieve high performance, in the algorithm, besides attribute information, both labeled data and relationship information are employed. Experimental results on Movie database show the effectiveness of this method.
出处
《软件学报》
EI
CSCD
北大核心
2008年第11期2814-2821,共8页
Journal of Software
基金
Supported by the National Natural Science Foundation of China under Grant Nos.60496321
60773099
60573073(国家自然科学基金)
the National High-Tech Research and Development Plan of China under Grant Nos.2006AA10Z244 2006AA10A309(国家高技术研究发展计划(863))
the Science and Technology Development Plan of Jilin Province of China under Grant No.20030523(吉林省科技发展计划)
the European Commission under Grant No.TH/Asia Link/010(111084)(欧盟项目)
关键词
数据挖掘
半监督学习
聚类算法
多关系数据
K均值聚类
data mining
semi-supervised learning
clustering algorithm
multi-type relational data
K-means clustering