The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors,significantly improving the scalability of many applications.But the scalability is limited within a...The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors,significantly improving the scalability of many applications.But the scalability is limited within a single node;therefore programmers still have to redesign applications to scale out over multiple nodes.This paper revisits the design and implementation of distributed shared memory (DSM)as a way to scale out applications optimized for non-uniform memory access (NUMA)architecture over a well-connected cluster.This paper presents MAGI,an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces.MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management.MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA)to reduce the number of page faults and the cost of the coherence protocol.MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications.We deployed MAGI over an 8-node RDMA-enabled cluster.Experimental evaluation shows that MAGI achieves up to 9.25:4 speedup compared with an unoptimized implementation,leading to a sealable performance for large-scale data-intensive applications.展开更多
在并行分布式计算领域中,基于NOW(Network of Workstation)的分布式共享存储器(DSM:Distributed Shared Memory)系统越来越受到人们的青睐.然而,要想在这种系统上获得较好的可编程性和性能,则需要系统支持细颗粒度的并行性,并且简化编程...在并行分布式计算领域中,基于NOW(Network of Workstation)的分布式共享存储器(DSM:Distributed Shared Memory)系统越来越受到人们的青睐.然而,要想在这种系统上获得较好的可编程性和性能,则需要系统支持细颗粒度的并行性,并且简化编程.我们设计了一个并行分布式系统,该系统由六台SUN Sparc上作站通过10M以太网组成,外加基于线程的MPI运行库和自定义的DSM-C程序设计语言,同时用软件方法实现Cache一致性.展开更多
基金the National Key Research and Development Program of China under Grant No.2016YFBI000500the National Natural Science Foundation of China under Grant No.61572314the National Youth Top-Notch Talent Support Program of China.
文摘The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors,significantly improving the scalability of many applications.But the scalability is limited within a single node;therefore programmers still have to redesign applications to scale out over multiple nodes.This paper revisits the design and implementation of distributed shared memory (DSM)as a way to scale out applications optimized for non-uniform memory access (NUMA)architecture over a well-connected cluster.This paper presents MAGI,an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces.MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management.MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA)to reduce the number of page faults and the cost of the coherence protocol.MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications.We deployed MAGI over an 8-node RDMA-enabled cluster.Experimental evaluation shows that MAGI achieves up to 9.25:4 speedup compared with an unoptimized implementation,leading to a sealable performance for large-scale data-intensive applications.