摘要
处理器性能的提升依赖于对存储系统性能的挖掘.随着片上集成内核数量的不断增大和特征尺寸的持续缩小,延迟、存储可扩展的Cache一致性协议已经成为提升访存效率的关键性因素.文中提出一种基于节点预测的直接Cache一致性协议-NPP协议,研究一致性交互延迟隐藏和目录存储开销减少技术.针对读、写缺失中存在的间接性问题和现有解决方案破坏已有数据局部性、无法获得最近数据副本等问题,分别提出节点挂起技术和直接写缺失处理技术,有效隐藏了目录访问延迟.为了实现准确的节点预测,作者还提出基于“签名”回收的历史信息更新算法,避免了冗余更新和不完整更新.使用SPLASH-2测试程序集,在基于2DMESHNoC互联的64核CMP下,相对于全映射目录协议,NPP协议的平均执行时间降幅为21.78%~31.11%;平均读缺失延迟降低14.22%~18.9%;平均写缺失延迟降低17.89%~21.13%.而获得上述性能提升的代价是网络流量平均增加6.62%~7.28%.
The performance promotion of modern processor depends on the excavation of memory system. Along with the booming of cores integrated in chip and the continual shrink of critical size, the cache coherence protocol with good scalability of latency and memory overhead has become the key factor to increase the memory access efficiency. This paper proposes a node predicting based direct cache coherence protocol--NPP, which mainly focuses on the research of techniques for cache coherence transaction latency hiding and memory overhead reduction. To solve the indirection problem in read/write miss transaction and overcome the shortcomings of data locality broken and inability to get the nearest valid copy in existing proposals, we propose the node hanging technique and direct write-miss processing technique to hide the directory access latency in read miss and write miss. In addition, we also propose a signature collection based history information update algorithm to avoid the superfluous or incomplete update. Simulation results show that for a 2D MESH NoC based 64-core CMP, compared to flat full map directory protocol, NPP reduces average execution time by 21.78%-31.11% ,average read miss latency by 14.22%- 18.9% and average write miss latency by 17.89% - 21.13%. Besides the above performance promotion, price of NPP is increasing of on-chip network traffic by 6.62%- 7.28% on average.
出处
《计算机学报》
EI
CSCD
北大核心
2014年第3期700-720,共21页
Chinese Journal of Computers
基金
国家"核高基"科技重大专项(2009ZX01039-003-001-03
2009ZX01023-004)
国家自然科学基金(60905007)资助~~