摘要
商业银行的数据规模随着传统业务扩展和互联网发展水平的不断提高而与日俱增,使得银行对数据的存储、管理和应用要求越来越高。通过搭建基于Hadoop技术的大数据平台,利用分布式文件系统HDFS、SQL分析引擎Inceptor、Nosql数据库工具Hyperbase、流处理工具Stream等架构,探索了大型商业银行Hadoop分布式数据仓库的构建过程,最终实现了由基于集中式存储架构的传统关系型数据仓库向分布式数据仓库的迁移工作。该分布式数据仓库实现了结构化数据和非结构化数据的存储、ETL调度管理、历史数据检索、交互式分析以及流数据处理。应用表明,相比基于集中式存储架构的传统关系型数据仓库,分布式数据仓库可大幅提高数据存储和数据服务的效率。
With the expansion of tradit ional business and the development of Internet, the rapid growth of data volumes in commercial banks requires stronger abilities on storage, management, application on a huge amount of data. Based on Hadoop and its various frameworks, including HDFS, Inceptor, Hyperbase, Stream, a distributed data warehouse for commercial banks was constructed. Various applications were migrated from the relational data warehouse based on centralized storage architecture, including the storage of heterogeneous data, management of ETL processing, historical data retrieval, interactive analysis and streaming data processing. Compared to the relational data warehouse, it is shown that the efficiency of data storage and services are substantially promoted on the distributed data warehouse.
出处
《计算机应用与软件》
2017年第8期72-75,113,共5页
Computer Applications and Software