Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection n...Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.展开更多
Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual...Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.展开更多
文摘Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.
文摘Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.