摘要
在CC-NUMA架构系统中,为了减少缓存一致性维护的开销,大规模CC-NUMA系统通常采用多级缓存一致性域设计,降低平均一致性维护操作数量,从而有效缓解系统性能扩展与一致性维护开销的矛盾.传统的MESI,MESIF,MOESI协议主要是针对单级一致性域优化设计,并且没有考虑到大型数据库应用中查询(数据读访问)业务量占据主导地位的特点,故该类一致性协议在多级缓存一致性域场景下存在着跨域操作频度高、执行效率低等缺点.针对上述问题,提出了一种基于共享转发态的多级缓存一致性协议MESI-SF.该协议创建了一个共享转发态Share-F,允许多个一致性域内同时存在远端数据副本的可读可转发状态,从而能够为同一域内同地址的读请求直接提供共享数据,有效减少了跨域操作,提升系统性能.SPLASH-2程序集模拟结果表明,对于两级Cache一致性域系统,相比MESI协议,MESI-SF能够减少23.0%跨结点访问次数,指令平均执行周期数(cycles per instruction,CPI)降低7.5%;相比MESIF协议,MESI-SF能够减少12.2%跨结点访问次数,指令平均执行周期数降低5.95%.
In CC-NUMA architecture system,in order to reduce the overhead of cache coherency maintenance,large scale CC-NUMA system usually employs multi-tier cache coherency domain method,so as to effectively alleviate the contradictions between system scalability and coherency maintenance overhead.The traditional MESI,MESIF,MOESI protocols mainly aim at the single tier coherency domain,and do not take into account the characteristic that query business accounts for dominant in the large database applications,therefore there are many problems such as high frequency of cross domain operations,low execution efficiency and so on when the protocols are used in multi-tier cache coherency domain.To address the above problems,this paper presents a multi-tier cache coherency protocol called MESI-SF based on the shared-forwarding state.The protocol creates a shared-forwarding state called Share-F,and thus there exists remote data copy with readable forwarding state in multiple coherency domains at the same time.In this way,within the same domain data copy with shared-forwarding state can directly response to read requests,and thus the protocol can effectively reduce the number of cross domain operations and enhance the system performance.Experimental simulation with SPLASH-2test suites shows that,for the two tiers cache coherency domain system,compared with the MESI protocol,MESI-SF can reduce23.0%visits that cross the clumps,and the average instruction execution cycle is reduced by7.5%;compared with the MESIF protocol,MESI-SF can reduce12.2%visits that cross the clumps,and the average cycles per instruction(CPI)is reduced by5.95%.
作者
陈继承
李一韩
赵雅倩
王恩东
史宏志
唐士斌
Chen Jicheng;Li Yihan;Zhao Yaqian;Wang Endong;Shi Hongzhi;Tang Shibin(State Key Laboratory of High-End Server X Storage Technology (Inspur Group Company Limited ),Beijing 100085))
出处
《计算机研究与发展》
EI
CSCD
北大核心
2017年第4期764-774,共11页
Journal of Computer Research and Development
基金
国家"八六三"高技术研究发展计划基金项目(2013AA011701)~~