摘要
为了提升电商大数据平台复杂数据操作性能,通过分析电商业务特点,从数据重新组织与平台参数调优两个方面对数据平台进行优化。在数据重新组织方面,使用ORC数据格式存储数据,并对数据表进行合理的分区、分桶;在平台参数调优方面,对业务涉及到的主要组件参数进行针对性调节。最后,通过搭建具有32个节点的Hadoop集群,并使用TPC-DS测试集进行仿真实验,验证调优思路及方法的有效性。结果表明,调优之后的平台性能大约是未进行任何优化平台的7.5倍,优化效果显著。
The goal of this paper is to improve the performance of e-commerce big data platforms for complex operation of data.In this paper,the characteristics of e-commerce are studied,and two approaches based on data reorganization and parameter optimization are carried out.Firstly,we use the ORC data format to store data and perform reasonable partitioning and binning of the data table.Second⁃ly,we make targeted adjustments to the main parameters of the main components involved in the business.The validity of aforemen⁃tioned methods is proved through TPC-DS benchmark simulated on a Hadoop cluster with 32 nodes.We find that after optimizations,the performance improves 7.5 times in comparison to that of a platform without any optimizations.
作者
马亚铭
陶利民
刘子琦
MA Ya-ming;TAO Li-min;LIU Zi-qi(Institute 503,China Academy of Space Technology,Beijing 100095,China)
出处
《软件导刊》
2020年第5期186-189,共4页
Software Guide