摘要
混合事务与分析处理(hybrid transactional analytical processing,HTAP)技术是一种基于一站式架构同时处理事务请求与查询分析请求的技术.HTAP技术不仅消除了从关系型事务数据库到数据仓库的数据抽取、转换和加载过程,还支持实时地分析最新事务数据.然而,为了同时处理OLTP与OLAP,HTAP系统也需要在系统性能与数据分析新鲜度之间做出取舍,这主要是因为高并发、短时延的OLTP与带宽密集型、高时延的OLAP访问模式不同且互相干扰.目前,主流的HTAP数据库主要以行列共存的方式来支持混合事务与分析处理,但是由于该类数据库面向不同的业务场景,所以它们的存储架构与处理技术各有不同.首先,全面调研HTAP数据库,总结它们主要的应用场景与优缺点,并根据存储架构对它们进行分类、总结与对比.现有综述工作侧重于基于行/列单格式存储的HTAP数据库以及基于Spark的松耦合HTAP系统,而这里侧重于行列共存的实时HTAP数据库.特别地,凝炼了主流HTAP数据库关键技术,包括数据组织技术、数据同步技术、查询优化技术、资源调度技术这4个部分.同时总结分析了HTAP数据库构建技术与评测基准.最后,讨论了HTAP技术未来的研究方向与挑战.
Hybrid transactional analytical processing(HTAP)relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously.It not only eliminates the extract-transform-load(ETL)process,but also enables real-time data analysis.Nevertheless,in order to process the mixed workloads of OLTP and OLAP,such systems must balance the trade-off between workload isolation and data freshness.This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive,long-running OLAP workloads.Most existing HTAP databases leverage the best of row store and column store to support HTAP.As there are different requirements for different HTAP applications,HTAP databases have disparate storage strategies and processing techniques.This study comprehensively surveys the HTAP databases.The taxonomy of state-of-the-art HTAP databases is introduced according to their storage strategies and architectures.Then,their pros and cons are summarized and compared.Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems,real-time HTAP databases with a row-column dual store are focused on.Moreover,a deep dive into their key techniques is accomplished regarding data organization,data synchronization,query optimization,and resource scheduling.The existing HTAP benchmarks are also introduced.Finally,the research challenges and open problems are discussed for HTAP.
作者
张超
李国良
冯建华
张金涛
ZHANG Chao;LI Guo-Liang;FENG Jian-Hua;ZHANG Jin-Tao(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第2期761-785,共25页
Journal of Software
基金
国家自然科学基金(61925205,62072261,62232009)。
关键词
HTAP数据库
行列共存
数据组织
查询优化
数据同步
资源调度
HTAP databases
row and column
data organization
query optimization
data synchronization
resource scheduling