摘要
涌现于社交网络、电子商务中的超大规模非结构化数据标志着大数据时代的到来。大数据的多样性、超大规模和可扩展性等特征对运行平台产生新的要求。随着大数据的产生和发展,形成了具有代表性的信息体系结构,包括编程模型、虚拟化和分布式文件系统等。随着对大数据研究的深入,通过对大数据负载特性的分析,发现制约大数据的并不是计算能力,而是I/O延迟,采用基于内存的分布式文件系统,用于存储和处理大规模分布式文件系统查询的索引,可以有效降低I/O延迟,提高应用性能。
As the advent of social network and e-commerce , the amount of unstructured data grows rapidly . The 4Vs of big data ( Volume , Velocity , Variety and Veracity ) motivate the architecture design of new computing system , including programming model , virtualisation technology and distributed file system . According to the analysis on the big data workloads , I/O latency is one of the dominate performance bottleneck . Techniques that create and store index with memory-based distributed file system are pro-posed , which are able to significantly reduce I/O latency and thus improve system performance .
出处
《微型机与应用》
2014年第2期15-17,24,共4页
Microcomputer & Its Applications
关键词
大数据
负载特征
内存系统
系统结构
big data
workload characteristic
memory system
system architecture