期刊文献+

基于方差聚类的时序数据野值识别算法 被引量:2

Variance clustering based outlier identification algorithm for time series data
下载PDF
导出
摘要 时序数据中的野值会直接影响数据挖掘算法的结果,甚至造成算法失效。传统的基于密度的带有噪声的空间聚类(DBSCAN)算法可以用来识别野值,但是却存在算法对参数敏感、时间复杂度高、精度不高等问题。针对时序数据的特点,提出了一种可自动进行多次识别的基于方差聚类的野值识别算法。该方法通过将传统的邻域密度转换为方差和均值、将密度阈值转换为时间窗口内的方差和阈值,在定义野值数据、野簇数据和异常簇数据的基础上,给出野值识别方法的判断规则。同时,针对一次野值识别不能将全部野值剔除的问题,通过定义多次野值识别的结束条件将算法扩展为多次野值识别算法。通过在某航天数据挖掘项目中的应用,验证了该算法具有较好的通用性、低的时间复杂度、可进行多次识别以提高精度等特点。 Outliers in time series data will directly affect the results in data mining, even make the algorithm inefficacious. Traditional Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm can be used in outlier identification; however, there are several deficiencies such as sensitive to parameters, higher time complexity and less accuracy. Considering the characteristics of time series data, an outlier identification algorithm based on variance clustering was proposed. By converting neighborhood density into variance and mean value, converting density threshold into variance and threshold of a time window, based on the definition of outlier data, outlier cluster data and abnormal data, the outlier identifieation rides were given. For applying the algorithm once will probably not eliminate all the outliers, it is expanded to a multiple identification algorithm by defining the termination condition. This algorithm was verified its generality, less time complexity and higher accurate by being applied to a space data mining system.
出处 《计算机应用》 CSCD 北大核心 2012年第A02期22-25,共4页 journal of Computer Applications
关键词 时序数据 野值识别 聚类挖掘 DBSCAN算法 time series data outlier identification clustering data mining DBSCAN algorithm
  • 相关文献

参考文献12

  • 1杨宁,唐常杰,王悦,陈瑜,郑皎凌.一种基于时态密度的倾斜分布数据流聚类算法[J].软件学报,2010,21(5):1031-1041. 被引量:17
  • 2HONG T P, LIOU Y L. Attribute clustering in high dimensional fea- ture spaces[ C]//Proceedings of the Sixth International Conference on Machine Learning and Cybernetics. Hong Kong: IEEE, 2007: 2286 - 2289.
  • 3FOTAKIS D. Incremental algorithms for facility location and k-medi- an[C]// ALGORITHMS: ESA 2004, LNCS 3221. Berlin: Spring- er, 2004:347 -358.
  • 4ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH - An effi- cient data clustering method for very large databases[ C]//Proceed- ings of the 1996 ACM S1GMOD International Conference on Manage- ment of Data. New York: ACM, 1996:103 - 114.
  • 5HO S L, YANG SHIYOU. A population-based incremental learning method for robust optimal solutions[ J]. IEEE Transactions on Mag- netics, 2010, 46(8) : 3189 -3192.
  • 6GABSI N, CLIROT F, HtBRAIL G. An hybrid data stream sum- marizing approach by sampling and clustering[ C]// Advances in Knowledge Discovery and Management, Studies in Computational In- telligence 292. Berlin: Springer, 2010:181-200.
  • 7WANG DINGDING, LI TAO. Document update summarization u- sing incremental hierarchical clustering[ C]// CIKM'10: Proceed- ings of the 19th ACM International Conference on Information and Knowledge Management. New York: ACM, 2010:279-288.
  • 8NASSAR S, SANDER J, CHENG C. Incremental and effective data summarization for dynamic hierarchical clustering [ C ]// S1GMOD'04: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2004:467 - 478.
  • 9LIANG B, AUSTIN J. A neural network for mining large volumes of time series data[ C]//ICIT2005: IEEE International Conference on Industrial Technology. [ S. 1. ] : IEEE, 2005:688 -693.
  • 10WANG XIAO-YE, WANG ZHENG-OU. Stock market time series data mining based on regularized neural network and rough set [C]// Proceedings of the First International Conference on Ma- chine Learning and Cybernetics. [ S. 1. ] : IEEE, 2002, 1:315 - 318.

二级参考文献1

共引文献16

同被引文献15

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部