摘要
Pawlak粗糙集理论是一种有监督学习模型,只适合处理离散型数据.但在一些现实问题中存在着大量的连续型数据,并且有标记数据很有限,更多的是无标记数据.结合邻域粗糙集和协同学习理论,提出了适合处理连续型数据并可有效利用无标记数据提升分类性能的邻域粗糙协同分类模型.该模型首先构建了邻域粗糙半监督约简算法,并利用该算法提取两个差异性较大的约简构造基分类器,然后迭代地在无标记数据上交互协同学习.UCI数据集实验对比分析表明,与其他同类模型相比,该模型有较好的性能.
Pawlak's rough set theory, as a supervised learning model, is only applicable for discrete data. However it is often the case that practical data sets are continuous and involve both few labeled and abundant unlabeled data, which is outside the realm of Pawlak's rough set theory. In this paper, a neighborhood rough sets based co-training model for classification is proposed, which could deal with continuous data and utilize the unlabeled and labeled data to achieve better performance than the classifier learned only from few labeled data. Firstly, a heuristic algorithm based on neighborhood mutual information is put forward to compute the reduct of partially labeled continuous data. Then two diverse reducts are generated. The model employs the two reducts to train two base classifiers on the labeled data, and makes the two base classifiers teach each other on the unlabeled data to boot the their performance iteratively. The experimental results on selected UCI datasets show that the proposed model are more effective to deal with partially labeled continuous data than some representative ones in learning accuracy.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第8期1811-1820,共10页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61075056
61273304
61202170
61103067)
中央高校基本科研业务费专项资金项目
关键词
邻域粗糙集
邻域互信息
半监督约简
协同学习
连续型数据
neighborhood rough sets
neighborhood mutual information
semi-supervised reduction
co-training
continuous data