期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Skyway:Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management
1
作者 Mo Zou Ming-Zhe Zhang +4 位作者 Ru-Jia Wang xian-he sun Xiao-Chun Ye Dong-Rui Fan Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第4期871-894,共24页
Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data appli... Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data applications.In this work,we propose to enhance graph processing applications by leveraging fine-grained memory access patterns with a dual-path architecture on top of existing software-based graph optimizations.We first identify that memory accesses to the offset,edge,and state array have distinct locality and impact on performance.We then introduce the Skyway architecture,which consists of two primary components:1)a dedicated direct data path between the core and memory to transfer state array elements efficiently,and 2)a data-type aware fine-grained memory-side row buffer hardware for both the newly designed direct data path and the regular memory hierarchy data path.The proposed Skyway architecture is able to improve the overall performance by reducing the memory access interference and improving data access efficiency with a minimal overhead.We evaluate Skyway on a set of diverse algorithms using large real-world graphs.On a simulated fourcore system,Skyway improves the performance by 23%on average over the best-performing graph-specialized hardware optimizations. 展开更多
关键词 graph application computer architecture memory hierarchy
原文传递
The Memory-Bounded Speedup Model and Its Impacts in Computing
2
作者 孙贤和 鲁潇阳 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期64-79,共16页
With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycent... With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking. 展开更多
关键词 memory-bounded speedup scalable computing memory-wall performance modeling and optimization data-centric design
原文传递
I/O Acceleration via Multi-Tiered Data Buffering and Prefetching 被引量:2
3
作者 Anthony Kougkas Hariharan Devarajan xian-he sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第1期92-120,共29页
Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVM... Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching. 展开更多
关键词 I/O BUFFERING heterogeneous BUFFERING layered BUFFERING deep memory hierarchy BURST BUFFERS hierarchical data PREFETCHING DATA-CENTRIC architecture
原文传递
A Study on Modeling and Optimization of Memory Systems
4
作者 Jason Liu Pedro Espina xian-he sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期71-89,共19页
Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC mod... Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC model measures the throughput of a memory architecture and therefore reflects the quality of service(QoS)of a memory system.The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy.The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design.These three models have been proposed separately through prior efforts.This paper reexamines the three models under one coherent mathematical framework.More specifically,we present a new memorycentric view of data accesses.We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy.This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency.Consequently,the performance model can be easily understood and applied in engineering practices.As such,the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems. 展开更多
关键词 performance modeling performance optimization memory architecture memory hierarchy concurrent average memory access time
原文传递
Preface
5
作者 xian-he sun Weikuan Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第1期1-3,共3页
It is our great pleasure to announce the publication of this special section in JCST,Selected I/O Technologies for High-Performance Computing and Data Analytics.With the explosive grow th of colossal data from various... It is our great pleasure to announce the publication of this special section in JCST,Selected I/O Technologies for High-Performance Computing and Data Analytics.With the explosive grow th of colossal data from various academic and industrial sectors,many High-Performance Computing(HPC)and data analytics systems have been developed to meet the needs of data collection,processing and analysis.Accordingly,many research groups around the world have explored unconventional and cut ting-edge ideas for the management of storage and I/O. 展开更多
关键词 analysis. JCS colossal
原文传递
Preface
6
作者 xian-he sun Dong Li 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期1-3,共3页
It is our great pleasure to announce the publication of this special section in Journal of Computer Science and Technology(JCST),Memory-Centric System Research for High-Performance Computing(HPC).The growing disparity... It is our great pleasure to announce the publication of this special section in Journal of Computer Science and Technology(JCST),Memory-Centric System Research for High-Performance Computing(HPC).The growing disparity between CPU speed and memory speed,known as the memory-wall problem,has been a long-standing challenge in the computing industry.Several memory technologies and architectures including 3Dstacked memory,non-volatile random-access memory(NVRAM),memristor,hybrid software and hardware caches,etc.have been introduced in recent years to address the infamous memory-wall problem. 展开更多
关键词 HARDWARE CACHE COMPUTING
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部