摘要
有关流数据分析与管理的研究是目前国际数据库研究领域的一个热点.在过去30多年中,尽管传统数据库技术发展迅速且得到了广泛应用,但是它不能够处理在诸如网络路由、传感器网络、股票分析等应用中所生成的一种新型数据,即流数据.流数据的特点是数据持续到达,且速度快、规模宏大;其研究核心是设计高效的单遍数据集扫描算法,在一个远小于数据规模的内存空间里不断更新一个代表数据集的结构棗概要数据结构,使得在任何时候都能够根据这个结构迅速获得近似查询结果.综述国际上关于流数据的概要数据结构生成与维护的研究成果,并通过列举解决流数据上两个重要问题的各种方案来比较各种算法的特点以及优劣.
The study on streaming data is one of the hot topics among the database circle all over the world recently. During the past three decades, conventional database technologies are well developed and widely applied. Unfortunately, they could not be adopted to handle a new kind of data, named streaming data, which is generated from applications such as network routing, sensor networking, stock analysis, etc. Because of the rapid data arriving speed and huge size of data set in stream model, novel algorithms that only require seeing the whole data set once are devised to support aggregation queries on demand. In addition, this kind of algorithms usually owns a data structure far smaller than the size of the whole data set. The ways to devise such synopsis data structures are introduced. These different approaches are also compared by listing historical works upon two classical problems over stream.
出处
《软件学报》
EI
CSCD
北大核心
2004年第8期1172-1181,共10页
Journal of Software
基金
国家高技术研究发展计划(863)~~