期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Efficient cache replacement framework based on access hotness for spacecraft processors
1
作者 GAO Xin NIAN Jiawei +1 位作者 LIU Hongjin YANG Mengfei 《中国空间科学技术(中英文)》 CSCD 北大核心 2024年第2期74-88,共15页
A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity... A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity of contemporary high-performance spacecraft processors.To harness these non-uniform access behaviors,an efficient cache replacement framework featuring an auxiliary cache specifically designed to retain evicted hot data was proposed.This framework reconstructs the cache replacement policy,facilitating data migration between the main cache and the auxiliary cache.Unlike traditional cacheline-granularity policies,the approach excels at identifying and evicting infrequently used data,thereby optimizing cache utilization.The evaluation shows impressive performance improvement,especially on workloads with irregular access patterns.Benefiting from fine granularity,the proposal achieves superior storage efficiency compared with commonly used cache management schemes,providing a potential optimization opportunity for modern resource-constrained processors,such as spacecraft processors.Furthermore,the framework complements existing modern cache replacement policies and can be seamlessly integrated with minimal modifications,enhancing their overall efficacy. 展开更多
关键词 spacecraft processors cache management replacement policy storage efficiency memory hierarchy MICROARCHITECTURE
下载PDF
Performance Behaviour Analysis of the Present 3-Level Cache System for Multi-Core Processors
2
作者 Muhammad Ali Ismail 《Computer Technology and Application》 2012年第11期729-733,共5页
In this paper, a study related to the expected performance behaviour of present 3-level cache system for multi-core systems is presented. For this a queuing model for present 3-level cache system for multi-core proces... In this paper, a study related to the expected performance behaviour of present 3-level cache system for multi-core systems is presented. For this a queuing model for present 3-level cache system for multi-core processors is developed and its possible performance has been analyzed with the increase in number of cores. Various important performance parameters like access time and utilization of individual cache at different level and overall average access time of the cache system is determined. Results for up to 1024 cores have been reported in this paper. 展开更多
关键词 MULTI-CORE memory hierarchy cache access time queuing analysis.
下载PDF
Skyway:Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management
3
作者 Mo Zou Ming-Zhe Zhang +4 位作者 Ru-Jia Wang Xian-He Sun Xiao-Chun Ye Dong-Rui Fan Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第4期871-894,共24页
Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data appli... Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data applications.In this work,we propose to enhance graph processing applications by leveraging fine-grained memory access patterns with a dual-path architecture on top of existing software-based graph optimizations.We first identify that memory accesses to the offset,edge,and state array have distinct locality and impact on performance.We then introduce the Skyway architecture,which consists of two primary components:1)a dedicated direct data path between the core and memory to transfer state array elements efficiently,and 2)a data-type aware fine-grained memory-side row buffer hardware for both the newly designed direct data path and the regular memory hierarchy data path.The proposed Skyway architecture is able to improve the overall performance by reducing the memory access interference and improving data access efficiency with a minimal overhead.We evaluate Skyway on a set of diverse algorithms using large real-world graphs.On a simulated fourcore system,Skyway improves the performance by 23%on average over the best-performing graph-specialized hardware optimizations. 展开更多
关键词 graph application computer architecture memory hierarchy
原文传递
A Survey of Non-Volatile Main Memory Technologies:State-of-the-Arts,Practices,and Future Directions 被引量:5
4
作者 Hai-Kun Liu Di Chen +4 位作者 Hai Jin Xiao-Fei Liao Binsheng He Kan Hu Yu Zhang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期4-32,共29页
Non-Volatile Main Memories (NVMMs) have recently emerged as a promising technology for future memory systems. Generally, NVMMs have many desirable properties such as high density, byte-addressability, non-volatility, ... Non-Volatile Main Memories (NVMMs) have recently emerged as a promising technology for future memory systems. Generally, NVMMs have many desirable properties such as high density, byte-addressability, non-volatility, low cost, and energy efficiency, at the expense of high write latency, high write power consumption, and limited write endurance. NVMMs have become a competitive alternative of Dynamic Random Access Memory (DRAM), and will fundamentally change the landscape of memory systems. They bring many research opportunities as well as challenges on system architectural designs, memory management in operating systems (OSes), and programming models for hybrid memory systems. In this article, we revisit the landscape of emerging NVMM technologies, and then survey the state-of-the-art studies of NVMM technologies. We classify those studies with a taxonomy according to different dimensions such as memory architectures, data persistence, performance improvement, energy saving, and wear leveling. Second, to demonstrate the best practices in building NVMM systems, we introduce our recent work of hybrid memory system designs from the dimensions of architectures, systems, and applications. At last, we present our vision of future research directions of NVMMs and shed some light on design challenges and opportunities. 展开更多
关键词 non-volatile memory persistent memory hybrid memory systems memory hierarchy
原文传递
Adapting Memory Hierarchies for Emerging Datacenter Interconnects 被引量:1
5
作者 江涛 侯锐 +5 位作者 董建波 柴琳 Sally A. McKee 田斌 张立新 孙凝晖 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期97-109,共13页
Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly co... Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects particularly as they affect remote memory access -- and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and ]imitations. 展开更多
关键词 high-speed interconnect memory hierarchy time shared memory datacenter network
原文传递
A Study on Modeling and Optimization of Memory Systems
6
作者 Jason Liu Pedro Espina Xian-He Sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期71-89,共19页
Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC mod... Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC model measures the throughput of a memory architecture and therefore reflects the quality of service(QoS)of a memory system.The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy.The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design.These three models have been proposed separately through prior efforts.This paper reexamines the three models under one coherent mathematical framework.More specifically,we present a new memorycentric view of data accesses.We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy.This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency.Consequently,the performance model can be easily understood and applied in engineering practices.As such,the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems. 展开更多
关键词 performance modeling performance optimization memory architecture memory hierarchy concurrent average memory access time
原文传递
A Unified Buffering Management with Set Divisible Cache for PCM Main Memory
7
作者 Mei-Ying Bian Su-Kyung Yoon Jeong-Geun Kim Sangjae Nam Shin-Dug Kim 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期137-146,共10页
This research proposes a phase-change memory (PCM) based main memory system with an effective combi- nation of a superblock-based adaptive buffering structure and its associated set divisible last-level cache (LLC... This research proposes a phase-change memory (PCM) based main memory system with an effective combi- nation of a superblock-based adaptive buffering structure and its associated set divisible last-level cache (LLC). To achieve high performance similar to that of dynamic random-access memory (DRAM) based main memory, the superblock-based adaptive buffer (SABU) is comprised of dual DRAM buffers, i.e., an aggressive superblock-based pre-fetching buffer (SBPB) and an adaptive sub-block reusing buffer (SBRB), and a set divisible LLC based on a cache space optimization scheme. According to our experiment, the longer PCM access latency can typically be hidden using our proposed SABU, which can significantly reduce the number of writes over the PCM main memory by 26.44%. The SABU approach can reduce PCM access latency up to 0.43 times, compared with conventional DRAM main memory. Meanwhile, the average memory energy consumption can be reduced by 19.7%. 展开更多
关键词 memory hierarchy memory structure cache memory buffer management
原文传递
On the Problem of Optimizing Parallel Programs for Complex Memory Hierarchies
8
作者 金国华 陈福接 《Journal of Computer Science & Technology》 SCIE EI CSCD 1994年第1期1-26,共26页
Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the thread... Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function or with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one)can be restructured to avoid the thrashing problem. Due to its simplicity, our method can be efficiently implemented in any parallel compiler, and the improvement of the performance is significant as shown by the experimental results. 展开更多
关键词 OPTIMIZATION parallel program complex memory hierarchies SRIS RSRIS compacted RSRIS
原文传递
Taxonomy of Data Prefetching for Multicore Processors 被引量:1
9
作者 Surendra Byna 陈勇 孙贤和 《Journal of Computer Science & Technology》 SCIE EI CSCD 2009年第3期405-417,共13页
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,... Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well. 展开更多
关键词 taxonomy of prefetching strategies multicore processors data prefetching memory hierarchy
原文传递
Software Support for LIRAC Architecture
10
作者 李鹏 汪东升 +3 位作者 王海霞 路美娟 李崇民 郑纬民 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第6期700-706,共7页
Memory limitations are always a focus of computer architecture. The live range aware cache (LIRAC) offers a way to reduce memory access using live range information. In the LIRAC system, scratch data need not be wri... Memory limitations are always a focus of computer architecture. The live range aware cache (LIRAC) offers a way to reduce memory access using live range information. In the LIRAC system, scratch data need not be written back if the data will no longer be used. Three kinds of software support developed for LIRAC architecture use compiler analyses, binary analyses, and trace analyses. Trace analysis results show that LIRAC can eliminate 29% of cache write-backs on average and up to 83% in the best case for the SPEC CPU 2000 benchmark. These software techniques can show the feasibility and potential benefit of the LIRAC architecture. 展开更多
关键词 live range LIRAC CACHE memory hierarchy
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部