摘要
垂直数据分区技术从逻辑上将满足一定语义条件的数据库表属性存放在同一个物理块中,进而降低数据访问成本,提高查询效率.数据库查询负载中的每条查询通常只与数据库表中的部分属性有关,因此只需使用数据库表的某个属性子集便可以得到准确的查询结果.合理的垂直数据分区方式可以使大多数查询负载不需要扫描完整数据库就可以完成查询任务,从而达到减少数据访问量,提高查询处理效率的目的.传统的数据库垂直分区方法主要基于专家设置的启发式规则,分区策略粒度较粗,且不能根据负载的特征进行有针对性的分区优化.同时,当负载规模较大或者属性个数较多时,现有垂直分区方法执行时间过长,尤其无法满足数据库在线实时调优的性能需求.为此,提出在线环境下基于谱聚类的垂直数据分区方法(spectral clustering based vertical partitioning,SCVP),采用分阶段求解的思想,减少算法时间复杂度,加快分区执行速度.首先通过增加约束条件缩小解空间(即根据谱聚类生成初始分区),然后对解空间设计算法进行精细的搜索(即采用频繁项集和贪心搜索相结合的策略对初始分区进行优化).为了进一步提升SCVP在高维属性下的性能,提出了SCVP的改进版本SCVP-R(spectral clustering based vertical partitioning redesign).SCVP-R通过引入同域竞争机制、双败淘汰机制和循环机制,对SCVP在分区优化过程中的合并方案进行了进一步优化.在不同数据集上的实验结果表明,相比于目前最好的垂直分区方法,SCVP和SCVP-R有着更快的执行时间和更好的性能表现.
Vertical data partitioning technology logically stores database table attributes satisfying certain semantic conditions in the same physical block,so as to reduce the cost of data accessing and improve the efficiency of query processing.Every query is usually related only to the table’s some attributes in the database,so a subset of the table’s attributes can be used to get the accurate query results.Reasonable vertical data partitioning can make most queries answered without scanning the whole table,so as to reduce the amount of data accessing and improve the efficiency of query processing.Traditional database vertical partitioning methods are mainly based on heuristic rules set by experts.The granularity of partitioning is coarse,and it can not provide different partition optimizations according to the characteristics of workload.Besides,when the scale of workload or the number of attributes becomes large,the execution time of the existing methods are too long to meet the performance requirements of online real-time tuning of database.Therefore,a method called spectral clustering based vertical partitioning(SCVP)is proposed for the online environment.The idea of phased solution is adapted to reduce the time complexity of the algorithm and speed up partitioning.Firstly,SCVP reduces the solution space by increasing the constraint conditions,that is,generating initial partitions by spectral clustering.Secondly,SCVP designs the algorithm to search solution space,that is,the initial partitions are optimized by combining frequent itemset mining and greedy search.In order to further improve the performance of SCVP under high-dimensional attributes,a new method called special clustering based vertical partitioning redesign(SCVP-R)is proposed which is an improved version of SCVP.SCVP-R optimizes the partitions combiner component of SCVP by introducing sympatric-competition mechanism,double-elimination mechanism,and loop mechanism.The experimental results on different datasets show that SCVP and SCVP-R have faster execution time and better performance than the current state-of-the-art vertical partitioning method.
作者
刘鹏举
李好洋
王天一
刘欢
孙路明
任逸飞
李翠平
陈红
LIU Peng-Ju;LI Hao-Yang;WANG Tian-Yi;LIU Huan;SUN Lu-Ming;REN Yi-Fei;LI Cui-Ping;CHEN Hong(Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China),Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China;Huawei Cloud Database Innovation Lab,Shenzhen 518100,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第6期2804-2832,共29页
Journal of Software
基金
国家重点研发计划(2018YFB1004401)
国家自然科学基金(61772537,61772536,62072460,62076245,62172424)。
关键词
垂直分区
谱聚类
频繁项集
贪心搜索
多阶段决策
vertical partitioning
spectral clustering
frequent itemsets
greedy search
multistage decision