期刊文献+

基于动态聚类的Rough集快速离散化算法 被引量:1

Quick Discretization Algorithm for Rough Set Based on Dynamic Clustering
下载PDF
导出
摘要 为处理大数据量决策表的离散化问题,设计高效的离散化算法是必要的.根据候选断点在单属性上重要性值的分布规律,提出了"先动态聚类,再选择候选断点"的思路和基于Rough集的快速离散化算法.首先,根据断点的重要性在单个特征上的分布规律,对断点进行快速动态聚类,从而有效降低候选断点的数目;然后,在聚类结果的基础上,采用启发式方法快速选择并得到最终的断点集,从而实现决策表的离散化.试验结果表明:通过动态聚类,多数数据集候选断点的数目能减少80%以上,大大提高了后续断点选择的效率;用提出的算法处理7个UCI数据集Iris、Wine、Glass、Ecoli、Breast_w、Pima和Letter,其正确识别率分别约为92.0%、92.1%、69.3%、65.7%、95.3%、67.1%和76.5%. In order to process the discretization of a decision table with large quantity objects,it is necessary to develop a high efficient discretization algorithm.The distribution of the importance values of candidate cuts on single attribute in a decision table was analyzed,and based on the distribution,a two-step solution procedure and a high efficient discretizaiton algorithm based on the rough set theory were proposed.Firstly,the candidate cuts are dynamically clustered in the light of their importance,so the number of the candidate cuts will decrease.Secondly,the final result cuts will be selected quickly from the clustered cuts using the heuristic method,as a result,the discretizaion of the decision table can be implemented by the final result cuts.The experiment results show that after dynamic clustering,the number of candidate cuts in most of data sets can be decreased by more than 80% to raise the efficiency of next cut selection greatly.To seven UCI data sets,Iris,Wine,Glass,Ecoli,Breast_w,Pima and Letter,in the experiments,their recognition rates are about 92.0%,92.1%,69.3%,65.7%,95.3%,67.1% and 76.5% respectively using the proposed algorithm.
出处 《西南交通大学学报》 EI CSCD 北大核心 2010年第6期977-983,共7页 Journal of Southwest Jiaotong University
基金 国家自然科学基金资助项目(60573068 60773113) 重庆市重点自然科学基金资助项目(2008BA2017) 重庆市杰出青年基金资助项目(2008BA2041) 重庆市教育委员会科学技术研究项目(KJ090512)
关键词 粗集 决策表 离散化 聚类 rough set decision table discretization clustering
  • 相关文献

参考文献8

二级参考文献60

共引文献848

同被引文献15

  • 1HAN liawei, KAMBER M. Data mining concepts and tech- niquesEM:. Amsterdam, the Nethetands: Elsevier, 2001.
  • 2MAHANTA P, AHMEDHA, KALITA J, et al. Discretiza- tion in gene expression data analysis: a selected survey[C:// Proceedings of the 2nd International Conference on Computa- tional Science, Engineering and Information Technology. New Work, N.Y. ,USA:ACM,2012:69-75.
  • 3CARCIA S, LUENGO J, SAEZ J A, et al. A survey of dis- cretization techniques:taxonomy and empirical analysis in su- pervised learning[l]. Knowledge and Data Engineering, 2013, 25(4) : 734-750.
  • 4NGUYEN S H. Discretization of real value attributes: a boole- an reasoning approachI-D]. Warsaw,Poland: Warsaw Universi- ty,1997.
  • 5SINGH G K, MINZ S. Discretization using clustering and ro- ugh set theoryEC://Proceedings of the International Confer- ence on Computing: Theory and Applications. Washington, D. C. ,USA:IEEE,2007:330-336.
  • 6XU Yulong, WANG Xiaopeng, XIAO Dawei. A two step pa- rallel discretization algorithm based on dynamic clustering [C://Proceedings of 2012 International Conference on Com- puter Science and Electronics Engineering. Washington, D. C. ,USA:IEEE,2012 : 192-196.
  • 7ESTER M,KRIEGEL H P, SANDER J, eta|. A density- based algorithm for discovering clusters in large spatial Data- base with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery Databases and Data Mining. Reston, Va. , USA:AAAI Press,1996:226-231.
  • 8刘民.基于数据的生产过程调度方法研究综述[J].自动化学报,2009,35(6):785-806. 被引量:38
  • 9熊忠阳,吴林敏,张玉芳.针对非均匀数据集的DBSCAN过滤式改进算法[J].计算机应用研究,2009,26(10):3721-3723. 被引量:11
  • 10张继福,李鑫,杨海峰.基于模糊C均值聚类的天文光谱特征线软离散化[J].光谱学与光谱分析,2012,32(5):1435-1438. 被引量:5

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部