摘要
相异性或相似性度量是数据挖掘领域中的2个基本问题。针对时间序列的相异性度量问题,给出时间序列的区域半径、区域极值点、区域等定义,提出一种区域极值点提取策略。通过提取有代表性的极值点以起到对时间序列数据约简和压缩的作用,进一步定义时间序列的动态时间弯曲距离度量其相异性。以此为基础提出一种新的时间序列层次聚类算法。仿真实验结果表明,与时间序列趋势特征提取等算法相比,该算法在数据的压缩效果和聚类准确率方面均有明显提高。
Dissimilarity or similarity is the key issue in data mining. data is hard to measure because of its original structure. Aiming at the problem of time series similarity measure,this paper proposes a re-description method based on locally extreme point of time series. In which,the original time series is described by extracting the locally extreme points from time series,reflecting the main features of the time series effectively and achieving the compression of time series data. Measuring the extreme series after equal-length treatment enhances the flexibility of the algorithm,and reduces its limitations. Based on the above,it is applied to hierarchical clustering of the time series. Simulation experimental results show that the clustering effect and data compression is obvious,and the clustering accuracy greatly improves compared with other algorithms based on time series trend features extraction.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第5期33-37,共5页
Computer Engineering
基金
中央高校基本科研业务费专项基金资助项目(JUSRP211A41)
江苏省产学研前瞻基金资助项目(BY2013015-23)
关键词
时间序列
区域极值点
重描述
数据压缩
相似性度量
层次聚类
time series
locally extreme point
re-description
data compression
similarity measure
hierarchical clustering