期刊文献+

面向电信欠费挖掘的数据质量评估策略研究 被引量:8

Research on telecom insolvency mining oriented data quality assessing strategy
下载PDF
导出
摘要 针对电信欠费挖掘主题,结合电信欠费数据非平衡的特点,重点研究了缺失与离群数据对分类结果的影响,从而提出了一个面向电信欠费挖掘的数据质量评估体系(TIM-DQAS):对于缺失评估,提出了一种基于类分布差异的属性加权算法,以衡量输入属性的缺失代价;对于离群评估,分析了非平衡数据中的离群点对分类结果的影响,提出离群度的概念,以量化离群点的影响。基于某城市电信小灵通数据的对比实验,给出了评估结果的参照值,验证了评估策略的有效性。 Aiming at telecom insolvency mining, combining with the imbalance nature of telecom insolvency data, the research priority is set upon the impact on classification result caused by missing values and outliers, and thus a Data Quality Assessment System for Telecom Insolvency Mining(TIM-DQAS) is presented.In the missing evaluation sub-system, a class- distribution-based attribute weighting algorithm is presented to measure the missing costs of input attributes.In the outlier evaluation sub-system,the impact on classification result caused by oufliers in imbalance data is analyzed, and the outlier degree is proposed to measure the impact caused by outliers.Based on a series of contrast experiments on telecom personal handphone data of a city,a reference assessing result is provided,and the effectiveness of the assessing strategy is verified.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第12期220-224,233,共6页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863) No.2008AA042902 No.2009AA04Z162 高等学校学科创新引智(111)计划资助(No.B07031)~~
关键词 电信 数据挖掘 欠费主题 数据质量评估 缺失 非平衡 离群度 telecom data mining insolvency data quality assessment missing value imbalance outlier degree
  • 相关文献

参考文献11

  • 1Vassiliadis P,Simitsis A,Skiadopoulos S.Conceptual modeling for ETL processes[C]//Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP,McLean,Virginia,USA,2002:14-21.
  • 2Wang R Y,Storey V C,Firth C P.A framework for analysis of data quality research[J].IEEE Transactions on Knowledge and Data Engineering,1995,7(4):623-640.
  • 3Pipino L L,Lee Y W,Wang R Y.Data quality assessment[J].Communicatinns of the ACM,2002,45(4):211-218.
  • 4韩京宇,徐立臻,董逸生.数据质量研究综述[J].计算机科学,2008,35(2):1-5. 被引量:102
  • 5Chawla N V,Japkowicz N,Kotcz A.Editorial:Special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
  • 6高嘉伟,梁吉业.非平衡数据集分类问题研究进展[J].计算机科学,2008,35(4):10-13. 被引量:16
  • 7Johnson T,Dasu T.Data quality and data cleaing:An overview[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data,San Diego,California,USA,2002.
  • 8Even A,Shankaranarayanan G.Utility-driven assessment of data quality[J].The DATA BASE for Advances in Information Systems,2007,38(2):75-93.
  • 9魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290. 被引量:44
  • 10Karypis G,Aggarwal R,Kumar V,et al.Multilevel hypergraph partitioning:Application in VLSI design[C]//Proceedings of the 34th annual Design Automation Conference,Anaheim,California,United States,1997:526-529.

二级参考文献125

  • 1郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制,2005,34(6):703-708. 被引量:17
  • 2韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 3谢纪刚,裘正定.非平衡数据集Fisher线性判别模型[J].北京交通大学学报,2006,30(5):15-18. 被引量:15
  • 4Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. Knowledge discovery and data mining: towards a unifying framework. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 82~88.
  • 5Ng, R. T., Han, J. Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago: Morgan Kaufmann, 1994. 144~155.
  • 6Ester, M., Kriegel, H.-p., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 226~231.
  • 7Zhang, T., Ramakrishnan, R., Linvy, M. BIRCH: an efficient eata clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 103~114.
  • 8Wang, W., Yang, J., Muntz, R. STING: a statistical information grid approach to spatial data mining. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 186~195.
  • 9Sheikholeslami, G., Chatterjee, S., Zhang, A. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York : Morgan Kaufmann, 1998. 428~439.
  • 10Hinneburg, A., Keim, D.A. An efficient approach to clustering in large multimedia databases with noise. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 58~65.

共引文献158

同被引文献55

引证文献8

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部