期刊文献+

一种基于Shapelets的懒惰式时间序列分类算法 被引量:9

A Novel Lazy Time Series Classification Algorithm Based on the Shapelets
下载PDF
导出
摘要 近些年,时间序列分类问题研究受到了越来越多的关注.基于shapelets的时间序列分类技术是一种有效的方法.然而,其在提取最优shapelet的过程中要建立包含大量冗余元素的候选shapelets集合,一般所获得的shapelets只在平均意义上具有某种鉴别性;与此同时,普通模型往往忽略了待分类实例所具有的局部特征.为此,我们提出了一种依据待分类实例显著局部特征的懒惰式分类模型.这种模型为每个待分类实例构建各自的数据驱动的懒惰式shapelets分类模型,从而逐步缩小了与其分类相关的时间序列搜索空间,使得所获得的shapelets能够直接反映待分类实例的显著局部特征.实验结果表明该文提出的模型具有较高的准确率和更强的可解释性. In order to discover the characteristics of data and explain the prediction process of classification model,the study of interpretable model has become increasingly prevalent in recent years.In reality,we can get massive time series data in many fields,such as weather forecast,medical monitoring,and anomaly detection.Time series classification is an important research field of time series data mining.Time series is different from the traditional attribute vector data,and it has no explicit attributes.Even with the sophisticated feature selection techniques,the dimensionality of potential feature space is still beyond the acceptable range.This poses a challenge to learn an accurate classification model with strong interpretability.Since shapelet is a new primitive that can be used to construct interpretable model,time series classification based on shapelet has recently attracted considerable interest.Shapelet-based classification algorithm is a typical shape-based algorithm.Shapelet can help us give a high sight on the local discriminative features of time series.According to the usage of shapelet,the shapelet-based models can be divided into two categories.One type method establishes a much smaller yet more discriminative feature set through the top-k shapelets to transform the origin dataset.Furthermore,traditional classification algorithms can be applied on the converted low-dimensional dataset.The other employs selected shapelets to build the classification model directly.However,these global shapelet-based models have some obvious shortcomings.First,the global model always needs to create a candidate shapelet set which contains a large number of redundant elements in the process of extracting the best shapelet.Due to the impact of redundant instances and intra-class variation,the extracted shapelets are merely good for the training instances in the average sense.The established shapelet-based model may not be suitable and efficient for the test cases.Second,the shapelets obtained may be from different instances or approximate solutions,which cannot indicate the local characteristics of the test case exactly.Third,since the class value of the local features from the test case is unknown,the characteristics of test cases are always ignored.In order to solve the above problems,a data driven local model based on shapelets for each test case is proposed.In our model,instead of global similarity,local similarity is considered as the basis for classification.The local features of the test case are evaluated directly to find the most discriminative shapelet.And then the shapelet is used to reduce the searching space of class attribute value progressively.Since the shapelets are from the test example,they directly reflect the salient local features of the test case and can answer the question why the model assigns a certain class value to the instance.Meanwhile,in the shapelet evaluation progress,instances are selected to reduce the impact of redundant instances and intra-class variation.The lazy classification model presented in this paper is compared with two shapelet decision tree models,1NN models based on different distance functions,and C4.5 models based on different top-k shapelets transformation algorithms.Experimental results show that the proposed model has higher accuracy and stronger interpretability.
作者 王志海 张伟 原继东 刘海洋 WANG Zhi-Hai;ZHANG Wei;YUAN Ji-Dong;LIU Hai-Yang(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044)
出处 《计算机学报》 EI CSCD 北大核心 2019年第1期29-43,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61672086 61702030 61771058) 中国博士后科学基金(2018M631328) 中央高校基本科研业务费专项资金(2017YJS036) 北京市自然科学基金(4182052)资助~~
关键词 时间序列 懒惰式学习 分类 shapelets 可解释性 time series lazy learning classification shapelets interpretability
  • 相关文献

参考文献2

二级参考文献30

  • 1Lines J, Davis LM, Hills J, Bagnall A. A shapelet transform for time series classification. In: Proc. of the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2012). Beijing: ACM Press, 2012. 289-297. [doi: 10.1145/2339530. 2339579].
  • 2Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ. Querying and mining of time series data: Experimental comparison of representations and distance measures. In: Proc. of the 34th lnt'l Conf. on Very Large Data Bases (VLDB 2008). Auckland: Springer-Verlag, 2008. 1542-1552. [doi: 10.14778/1454159.1454226].
  • 3Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E. Searching and mining trillions of time series subsequences under dynamic time warping. In: Proc. of the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2012). Beijing: ACM Press, 2012. 262-270. [doi: 10.1145/2339530.2339576].
  • 4Berndt DJ, Clifford J. Using dynamic time warping to find patterns in time series. In: Proc. of the KDD Workshop. 1994. 359-370.
  • 5Ye L, Keogh EJ. Time series shapelets: A new primitive for data mining. In: Proc. of the 15th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2009). Paris: ACM Press, 2009. 947-956. [doi: 10.1145/1557019.1557122].
  • 6Ye L, Keogh EJ. Time series shapelets: A novel technique that allows accurate, interpretable and fast classification. Data Mining and Knowledge Discovery, 2011,22(1-2):149-182. [doi: 10.1007/s10618-010-0179-5].
  • 7Mueen A, Keogh E J, Young N. Logical-Shapelets: An expressive primitive for time series classification. In: Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2011). San Diego: ACM Press, 2011. 1154-1162. [doi: 10. 1145/2020408.2020587].
  • 8Rakthanmauon T, Keogh EJ. Fast shapelets: A scalable algorithm for discovering time aeries shapelets. In: Proc. of the ~3th SIAM Int'l Conf. on Data Mining (SDM 2013). Austin: SIAM Press, 2013. 668-676. [doi: 10.1137/1.9781611972832.74].
  • 9Keogh E, Lonardi S, Ratanamahatana CA. Towards parameter-free data mining. In: Proe. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2004). Seattle: ACM Press, 2004. 206-215. [doi: 10.1145/1014052.1014077].
  • 10Esling P, Agon C. Time-Series data mining. ACM Computing Surveys, 2012,45(1):12:1-12:34. [doi: 10.1145/2379776.2379788].

共引文献25

同被引文献88

引证文献9

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部