摘要
大数据将在未来经济、社会和生活等领域产生深远影响,而大数据的整合存储研究为促进和深化其应用范围有重要的理论和实践意义.本文利用分布式文件系统HBase的数据存储结构特征,采用可拓学基元对异构数据集进行整合处理并存储在HBase数据库.通过提取数据特别是半结构化和非结构化数据的典型特征和属性取值并转换为基元后生成一个新的数据集,不但为数据分析和数据解释提供一种新的实现方式,而且为领域问题的策略生成提供大数据的研究思路和解决方案.
Big data will have a profound impact on economics , society and life in the near future , and the research on integration and storage of big data may play an important theoretical and practical role in pro-moting and deepening the application scope of big data .It utilized the data storage structure of the dis-tributed file system named as HBase and the basic-element of Extenics to integrate the heterogeneous data sets , and then stored the processed data set in HBase database .The new data set , which was obtained by extracting the typical characteristics as well as their value of data , especially the semi-structured and un-structured data , provides not only a new way for the analysis and interpretation of data but also the re-search ideas and strategy generation for the professional issues from the perspective of big data .
出处
《广东工业大学学报》
CAS
2014年第3期8-13,共6页
Journal of Guangdong University of Technology
基金
国家自然科学基金资助项目(61273306)
关键词
大数据
基元
数据模型
分布式文件
可拓学
big data
basic-element
data model
distributed file
Extenics