The network on chip(NoC)is used as a solution for the communication problems in a complex system on chip(SoC)design.To further enhance performances,the NoC architectures,a high level modeling and an evaluation met...The network on chip(NoC)is used as a solution for the communication problems in a complex system on chip(SoC)design.To further enhance performances,the NoC architectures,a high level modeling and an evaluation method based on OPNET are proposed to analyze their performances on different injection rates and traffic patterns.Simulation results for general NoC in terms of the average latency and the throughput are analyzed and used as a guideline to make appropriate choices for a given application.Finally,a MPEG4 decoder is mapped on different NoC architectures.Results prove the effectiveness of the evaluation method.展开更多
A solution is imperatively expected to meet the efficient contention resolution schemes for managing simultaneous access requests to the communication resources on the Network on Chip (NoC). Based on the ideas of conf...A solution is imperatively expected to meet the efficient contention resolution schemes for managing simultaneous access requests to the communication resources on the Network on Chip (NoC). Based on the ideas of conflict-free transmission, priority-based service, and dynamic self-adaptation to loading, this paper presents a novel scheduling algorithm for Medium Access Control (MAC) in NoC with the researches of the communication structure features of 2D mesh. The algorithm gives priority to guarantee the Quality of Service (QoS) for local input port as well as dynamic adjustment of the performance of the other ports along with input load change. The theoretical model of this algorithm is established with Markov chain and probability generating function. Mathematical analysis is made on the mean queue length and the mean inquiry cyclic time of the system. Simulated experiments are conducted to test the accuracy of the model. It turns out that the findings from theoretical analysis correspond well with those from simulated experiments. Further more, the analytical findings of the system performance demonstrate that the algorithm enables effectively strengthen the fairness and stability of data transmissions in NoC.展开更多
Aiming at the applications of NOC (network on chip) technology in rising scale and complexity on chip systems, a Torus structure and corresponding route algorithm for NOC is proposed. This Torus structure improves t...Aiming at the applications of NOC (network on chip) technology in rising scale and complexity on chip systems, a Torus structure and corresponding route algorithm for NOC is proposed. This Torus structure improves traditional Torus topology and redefines the denotations of the routers. Through redefining the router denotations and changing the original router locations, the Torus structure for NOC application is reconstructed. On the basis of this structure, a dead-lock and live-lock free route algorithm is designed according to dimension increase. System C is used to implement this structure and the route algorithm is simulated. In the four different traffic patterns, average, hotspot 13%, hotspot 67% and transpose, the average delay and normalization throughput of this Torus structure are evaluated. Then, the performance of delay and throughput between this Torus and Mesh structure is compared. The results indicate that this Torus structure is more suitable for NOC applications.展开更多
Nowadays the number of cores that are integrated into NoC (Network on Chip) systems is steadily increasing, and real application traffic, running in such multi-core environments requires more and more bandwidth. In th...Nowadays the number of cores that are integrated into NoC (Network on Chip) systems is steadily increasing, and real application traffic, running in such multi-core environments requires more and more bandwidth. In that sense, NoC architectures should be properly designed so as to provide efficient traffic engineering, as well as QoS support. Routing algorithm choice in conjunction with other parameters, such as network size and topology, traffic features (time and spatial distribution), as well as packet injection rate, packet size, and buffering capability, are all equivalently critical for designing a robust NoC architecture, on the grounds of traffic engineering and QoS provision. In this paper, a thorough numerical investigation is achieved by taking into consideration the criticality of selecting the proper routing algorithm, in conjunction with all the other aforementioned parameters. This is done via implementation of four routing evaluation traffic scenarios varying each parameter either individually, or as a set, thus exhausting all possible combinations, and making compact decisions on proper routing algorithm selection in NoC architectures. It has been shown that the simplicity of a deterministic routing algorithm such as XY, seems to be a reasonable choice, not only for random traffic patterns but also for non-uniform distributed traffic patterns, in terms of delay and throughput for 2D mesh NoC systems.展开更多
The NoC consists of processing element (PE), network interface (NI) and router. This paper proposes a hybrid scheme for Netwok of Chip (NoC), which aims at obtaining low latency and low power consumption by concerning...The NoC consists of processing element (PE), network interface (NI) and router. This paper proposes a hybrid scheme for Netwok of Chip (NoC), which aims at obtaining low latency and low power consumption by concerning wired and wireless links between routers. The main objective of this paper is to reduce the latency and power consumption of the network on chip architecture using wireless link between routers. In this paper, the power consumption is reduced by designing a low power router and latency is reduced by implementing a on-chip wireless communication as express links for transferring data from one subnet routers to another subnet routers. The average packet latency and normalized power consumption of proposed hybrid NoC router are analyzed for synthetic traffic loads as shuffle traffic, bitcomp traffic, transpose traffic and bitrev traffic. The proposed hybrid NoC router reduces the normalized power over the wired NoC by 12.18% in consumer traffic, 12.80% in AutoIndust traffic and 12.5% in MPEG2 traffic. The performance is also analyzed with real time traffic environments using Network simulator 2 tool.展开更多
Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and...Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and the data delay increases dramatically. With the advent of optical router, the traditional electrical interconnection mode has changed to optical interconnection mode. In the packet switched optical interconnection network, the data communication mechanism consists of 3 processes: link establishment, data transmission and link termination, but the circuit-switched data transmission method greatly limits the utilization of resources. The number of micro-ring resonators in the on-chip large-scale optical interconnect network is an important parameter affecting the insertion loss. The proposed λ-route, GWOR, Crossbar structure has a large overall network insertion loss due to the use of many micro-ring resonators. How to use the least micro-ring resonator to realize non-blocking communication between multiple cores has been a research hotspot. In order to improve bandwidth and reduce access latency, an optical interconnection structure called multilevel switching optical network on chip(MSONoC) is proposed in this paper. The broadband micro-ring resonators(BMRs) are employed to reduce the number of micro-ring resonators(MRs) in the network, and the structure can provide the service of non-blocking point to point communication with the wavelength division multiplexing(WDM) technology. The results show that compared to λ-route, GWOR, Crossbar and the new topology structure, the number of micro-ring resonators of MSONoC are reduced by 95.5%, 95.5%, 87.5%, and 60% respectively. The insertion loss of the minimum link of new topology, mesh and MSONoC structure is 0.73 dB, 0.725 dB and 0.38 dB.展开更多
In this paper, we propose a technique for lowering the latency of the communication in a NoC (network on chip). The technique, which can support two qualities of service (QoS), i.e., the guaranteed throughput (GT...In this paper, we propose a technique for lowering the latency of the communication in a NoC (network on chip). The technique, which can support two qualities of service (QoS), i.e., the guaranteed throughput (GT) and best effort (BE), is based on splitting a wider link into narrower links to increase throughput and decrease latency in the NoC. In addition, to ease the synchronization and reduce the crosstalk, we use the l-of-4 encoding for the smaller buses. The use of the encoding in the proposed NoC architecture considerably lowers the latency for both BE and GT packets. In addition, the bandwidth is increased while the power consumption of the links is reduced.展开更多
Mapping of three-dimensional network on chip is a key problem in the research of three-dimensional network on chip. The quality of the mapping algorithm used di- rectly affects the communication efficiency between IP ...Mapping of three-dimensional network on chip is a key problem in the research of three-dimensional network on chip. The quality of the mapping algorithm used di- rectly affects the communication efficiency between IP cores and plays an important role in the optimization of power consumption and throughput of the whole chip. In this paper, ba- sic concepts and related work of three-dimensional network on chip are introduced. Quantum-behaved particle swarm op- timization algorithm is applied to the mapping problem of three-dimensional network on chip for the first time. Sim- ulation results show that the mapping algorithm based on quantum-behaved particle swarm algorithm has faster con- vergence speed with much better optimization performance compared with the mapping algorithm based on particle swarm algorithm. It also can effectively reduce the power consumption of mapping of three-dimensional network on chip.展开更多
Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that m...Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network, The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.展开更多
For the purpose of solving the shortcomings of low speed and high power consumption of asynchronous wrapper in conventional network on chips,this paper proposes a quasi delay-insensitive high-speed two-phase operation...For the purpose of solving the shortcomings of low speed and high power consumption of asynchronous wrapper in conventional network on chips,this paper proposes a quasi delay-insensitive high-speed two-phase operation mode asynchronous wrapper.The metastable state in sampling data procedure can be avoided by detecting the write/read signal, which can be used to stop the clock.Empty/full level of the registers can be determined by detecting the pulse signal of the two-phase asynchronous register,and then control the wrapper to sample input/output data.Sender wrapper and receiver wrapper consist of C elements and threshold gates,which ensure the quasi delay-insensitive characteristics and enhance the robustness.Simulations under different technology corners are implemented based on SMIC 0.18μm standard CMOS. Sender wrapper and receiver wrapper allow synchronous modules to work at the speed of 3.08 GHz and 2.98 GHz respectively with average dynamic power consumption of 1.727 mW and 1.779 mW.Its advantages of high-throughput,low-power, scalability and robustness make it a viable option for high-speed low-power interconnection of network-on-chip.展开更多
As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocess...As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles: design complexity, performance and scalability. This paper proposes Cache Coherent Network on Chip (CCNoC), a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and shared L2 caches from the chore of maintaining coherency, thereby reduces design complexity of CMPs. In this way, CCNoC also improves the performance of cache coherence protocol through reducing directory access latency and enhances scalability by avoiding massive directories overhead in shared L2 caches. In CCNoC, coherence state caches and active directory caches are implemented in the network interface components of network on chip to maintain cache coherence states for blocks in L1 caches and manage directory information for recently accessed blocks in L2 caches respectively. CCNoC provides a scalable CMP framework to tackle cache coherency which is the foundation of CMP. This paper evaluates the performance of CCNoC. Experimental results show that for a 16-core system, CCNoC improves performance by 3% on average over the conventional chip multiprocessor and by 10% at best, while reduces storage overhead by 1.8% and saves directory storage by 88%, showing good scalability.展开更多
Network on chip(NoC)is an infrastructure providing a communication platform to multiprocessor chips.Furthermore,the wormhole-switching method,which shares resources,was used to increase its efficiency;however,this can...Network on chip(NoC)is an infrastructure providing a communication platform to multiprocessor chips.Furthermore,the wormhole-switching method,which shares resources,was used to increase its efficiency;however,this can lead to congestion.Moreover,dealing with this congestion consumes more energy and correspondingly leads to increase in power consumption.Furthermore,consuming more power results in more heat and increases thermal fluctuations that lessen the life span of the infrastructures and,more importantly,the network’s performance.Given these complications,providing a method that controls congestion is a significant design challenge.In this paper,a fuzzy logic congestion control routing algorithm is presented to enhance the NoC’s performance when facing congestion.To avoid congestion,the proposed algorithm employs the occupied input buffer and the total occupied buffers of the neighboring nodes along with the maximum possible path diversity with minimal path length from instant neighbors to the destination as the selection parameters.To enhance the path selection function,the uncertainty of the fuzzy logic algorithm is used.As a result,the average delay,power consumption,and maximum delay are reduced by 14.88%,7.98%,and 19.39%,respectively.Additionally,the proposed method enhances the throughput and the total number of packets received by 14.9%and 11.59%,respectively.To show the significance,the proposed algorithm is examined using transpose traffic patterns,and the average delay is improved by 15.3%.The average delay is reduced by 3.8%in TMPEG-4(treble MPEG-4),36.6%in QPIP(quadruplicate PIP),and 20.9%in TVOPD(treble VOPD).展开更多
Occurrence of faults in Network on Chip(NoC) is inevitable as the feature size is con-tinuously decreasing and processing elements are increasing in numbers.Faults can be revocable if it is transient.Transient fault m...Occurrence of faults in Network on Chip(NoC) is inevitable as the feature size is con-tinuously decreasing and processing elements are increasing in numbers.Faults can be revocable if it is transient.Transient fault may occur inside router,or in the core or in communication wires.Examples of transient faults are overflow of buffers in router,clock skew,cross talk,etc..Revocation of transient faults can be done by retransmission of faulty packets using oblivious or adaptive routing algorithms.Irrevocable faults causes non-functionality of segment and mainly occurs during fabrication process.NoC reliability increases with the efficient routing algorithms,which can handle the maximum faults without deadlock in network.As transient faults are temporary and can be easily revoked using re-transmission of packet,permanent faults require efficient routing to route the packet by bypassing the nonfunctional segments.Thus,our focus is on the analysis of adaptive minimal path fault tolerant routing to handle the permanent faults.Comparative analysis between partial adaptive fault tolerance routing West-First,North-Last,Negative-First,Odd Even,and Minimal path Fault Tolerant routing(MinFT) algorithms with the nodes and links failure is performed using NoC Interconnect RoutinG and Application Modeling simulator(NIRGAM) for the 2D Mesh topology.Result suggests that MinFT ensures data transmission under worst conditions as compared to other adaptive routing algorithms.展开更多
This paper presents the result of experiments conducted in mesh networks on different routing algorithms, traffic generation schemes and switching schemes. A new network on chip (NoC) topology based on partial interco...This paper presents the result of experiments conducted in mesh networks on different routing algorithms, traffic generation schemes and switching schemes. A new network on chip (NoC) topology based on partial interconnection of mesh network is proposed and a routing algorithm supporting the proposed architecture is developed. The proposed architecture is similar to standard mesh networks, where four extra bidirectional channels are added which remove the congestion and hotspots compared to standard mesh networks with fewer channels. Significant improvement in delay (60% reduction) and throughput (60% increase) was observed using the proposed network and routing when compared with the ideal mesh networks. An increase in number of channels makes the switches expensive and could increase the area and power consumption. However, the proposed network can be useful in high speed applications with some compromise on area and power.展开更多
Network on chip (NoC) architectures have been proposed to resolve complex on-chip communication problems. An NoC-based mapping algorithm is shown in this paper. It can map irregular intellectual properties (IPs) cores...Network on chip (NoC) architectures have been proposed to resolve complex on-chip communication problems. An NoC-based mapping algorithm is shown in this paper. It can map irregular intellectual properties (IPs) cores onto regular tile 2-D mesh NoC architectures. The basic idea is to decompose a large IP into several dummy IPs or integrate several small IPs into one dummy IP, such that each dummy IP can fit into a single tile. It can also allocate buffer space according to the input/output degree and avoid connection congestion by adapting communication density. Experimental data indicate that using the algorithm proposed in this paper, the communication energy can be reduced about 7%.展开更多
Computer architecture is transiting from the multicore era into the heterogeneous era in which heterogeneous architectures use on-chip networks to access shared resources and how a network is configured will likely ha...Computer architecture is transiting from the multicore era into the heterogeneous era in which heterogeneous architectures use on-chip networks to access shared resources and how a network is configured will likely have a significant impact on overall performance and power consumption. Recently, heterogeneous network on chip (NoC) has been proposed not only to achieve performance comparable to that of the NoCs with buffered routers but also to reduce buffer cost and energy consumption. However, heterogeneous NoC design for heterogeneous GPU-CPU architectures has not been studied in depth. This paper first evaluates the performance and power consumption of a variety of static hot-potato based heterogeneous NoCs with different buffered and bufferless router placements, which is helpful to explore the design space for heterogeneous GPU-CPU interconnection. Then it proposes Unidirectional Flow Control (UFC), a simple credit-based flow control mechanism for heterogeneous NoC in GPU-CPU architectures to control network congestion. UFC can guarantee that there are always unoccupied entries in buffered routers to receive flits coming from adjacent bufferless routers. Our evaluations show that when compared to hot-potato routing, UFC improves performance by an average of 14.1% with energy increased by an average of 5.3% only.展开更多
基金Supported by the Natural Science Foundation of China(61076019)the China Postdoctoral Science Foundation(20100481134)+1 种基金the Natural Science Foundation of Jiangsu Province(BK2008387)the Graduate Student Innovation Foundation of Jiangsu Province(CX07B-105z)~~
文摘The network on chip(NoC)is used as a solution for the communication problems in a complex system on chip(SoC)design.To further enhance performances,the NoC architectures,a high level modeling and an evaluation method based on OPNET are proposed to analyze their performances on different injection rates and traffic patterns.Simulation results for general NoC in terms of the average latency and the throughput are analyzed and used as a guideline to make appropriate choices for a given application.Finally,a MPEG4 decoder is mapped on different NoC architectures.Results prove the effectiveness of the evaluation method.
基金Supported by the National Natural Science Foundation of China(No.61072079)
文摘A solution is imperatively expected to meet the efficient contention resolution schemes for managing simultaneous access requests to the communication resources on the Network on Chip (NoC). Based on the ideas of conflict-free transmission, priority-based service, and dynamic self-adaptation to loading, this paper presents a novel scheduling algorithm for Medium Access Control (MAC) in NoC with the researches of the communication structure features of 2D mesh. The algorithm gives priority to guarantee the Quality of Service (QoS) for local input port as well as dynamic adjustment of the performance of the other ports along with input load change. The theoretical model of this algorithm is established with Markov chain and probability generating function. Mathematical analysis is made on the mean queue length and the mean inquiry cyclic time of the system. Simulated experiments are conducted to test the accuracy of the model. It turns out that the findings from theoretical analysis correspond well with those from simulated experiments. Further more, the analytical findings of the system performance demonstrate that the algorithm enables effectively strengthen the fairness and stability of data transmissions in NoC.
基金the National Natural Science Fundation of China (60575031).
文摘Aiming at the applications of NOC (network on chip) technology in rising scale and complexity on chip systems, a Torus structure and corresponding route algorithm for NOC is proposed. This Torus structure improves traditional Torus topology and redefines the denotations of the routers. Through redefining the router denotations and changing the original router locations, the Torus structure for NOC application is reconstructed. On the basis of this structure, a dead-lock and live-lock free route algorithm is designed according to dimension increase. System C is used to implement this structure and the route algorithm is simulated. In the four different traffic patterns, average, hotspot 13%, hotspot 67% and transpose, the average delay and normalization throughput of this Torus structure are evaluated. Then, the performance of delay and throughput between this Torus and Mesh structure is compared. The results indicate that this Torus structure is more suitable for NOC applications.
文摘Nowadays the number of cores that are integrated into NoC (Network on Chip) systems is steadily increasing, and real application traffic, running in such multi-core environments requires more and more bandwidth. In that sense, NoC architectures should be properly designed so as to provide efficient traffic engineering, as well as QoS support. Routing algorithm choice in conjunction with other parameters, such as network size and topology, traffic features (time and spatial distribution), as well as packet injection rate, packet size, and buffering capability, are all equivalently critical for designing a robust NoC architecture, on the grounds of traffic engineering and QoS provision. In this paper, a thorough numerical investigation is achieved by taking into consideration the criticality of selecting the proper routing algorithm, in conjunction with all the other aforementioned parameters. This is done via implementation of four routing evaluation traffic scenarios varying each parameter either individually, or as a set, thus exhausting all possible combinations, and making compact decisions on proper routing algorithm selection in NoC architectures. It has been shown that the simplicity of a deterministic routing algorithm such as XY, seems to be a reasonable choice, not only for random traffic patterns but also for non-uniform distributed traffic patterns, in terms of delay and throughput for 2D mesh NoC systems.
文摘The NoC consists of processing element (PE), network interface (NI) and router. This paper proposes a hybrid scheme for Netwok of Chip (NoC), which aims at obtaining low latency and low power consumption by concerning wired and wireless links between routers. The main objective of this paper is to reduce the latency and power consumption of the network on chip architecture using wireless link between routers. In this paper, the power consumption is reduced by designing a low power router and latency is reduced by implementing a on-chip wireless communication as express links for transferring data from one subnet routers to another subnet routers. The average packet latency and normalized power consumption of proposed hybrid NoC router are analyzed for synthetic traffic loads as shuffle traffic, bitcomp traffic, transpose traffic and bitrev traffic. The proposed hybrid NoC router reduces the normalized power over the wired NoC by 12.18% in consumer traffic, 12.80% in AutoIndust traffic and 12.5% in MPEG2 traffic. The performance is also analyzed with real time traffic environments using Network simulator 2 tool.
基金Supported by the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)Shaanxi Provincial Co-ordination Innovation Project of Science and Technology(No.2016KTZDGY02-04-02)+1 种基金Shaanxi Provincial Key R&D Plan(No.2017GY-060)Shaanxi International Science and Technology Cooperation Program(No.2018KW-006).
文摘Electric router is widely used for multi-core system to interconnect each other. However, with the increasing number of processor cores, the probability of communication conflict between processor cores increases, and the data delay increases dramatically. With the advent of optical router, the traditional electrical interconnection mode has changed to optical interconnection mode. In the packet switched optical interconnection network, the data communication mechanism consists of 3 processes: link establishment, data transmission and link termination, but the circuit-switched data transmission method greatly limits the utilization of resources. The number of micro-ring resonators in the on-chip large-scale optical interconnect network is an important parameter affecting the insertion loss. The proposed λ-route, GWOR, Crossbar structure has a large overall network insertion loss due to the use of many micro-ring resonators. How to use the least micro-ring resonator to realize non-blocking communication between multiple cores has been a research hotspot. In order to improve bandwidth and reduce access latency, an optical interconnection structure called multilevel switching optical network on chip(MSONoC) is proposed in this paper. The broadband micro-ring resonators(BMRs) are employed to reduce the number of micro-ring resonators(MRs) in the network, and the structure can provide the service of non-blocking point to point communication with the wavelength division multiplexing(WDM) technology. The results show that compared to λ-route, GWOR, Crossbar and the new topology structure, the number of micro-ring resonators of MSONoC are reduced by 95.5%, 95.5%, 87.5%, and 60% respectively. The insertion loss of the minimum link of new topology, mesh and MSONoC structure is 0.73 dB, 0.725 dB and 0.38 dB.
基金Project supported by the Iranian National Science Foundation
文摘In this paper, we propose a technique for lowering the latency of the communication in a NoC (network on chip). The technique, which can support two qualities of service (QoS), i.e., the guaranteed throughput (GT) and best effort (BE), is based on splitting a wider link into narrower links to increase throughput and decrease latency in the NoC. In addition, to ease the synchronization and reduce the crosstalk, we use the l-of-4 encoding for the smaller buses. The use of the encoding in the proposed NoC architecture considerably lowers the latency for both BE and GT packets. In addition, the bandwidth is increased while the power consumption of the links is reduced.
文摘Mapping of three-dimensional network on chip is a key problem in the research of three-dimensional network on chip. The quality of the mapping algorithm used di- rectly affects the communication efficiency between IP cores and plays an important role in the optimization of power consumption and throughput of the whole chip. In this paper, ba- sic concepts and related work of three-dimensional network on chip are introduced. Quantum-behaved particle swarm op- timization algorithm is applied to the mapping problem of three-dimensional network on chip for the first time. Sim- ulation results show that the mapping algorithm based on quantum-behaved particle swarm algorithm has faster con- vergence speed with much better optimization performance compared with the mapping algorithm based on particle swarm algorithm. It also can effectively reduce the power consumption of mapping of three-dimensional network on chip.
文摘Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network, The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.
基金Supported by the National Natural Science Foundation of China under Grant Nos.60725415,60971066the National High-Tech Research and Development 863 Program of China under Grant Nos.2009AA01Z258,2009AA01Z260the National Science & Technology Important Project under Grant No.2009ZX01034-002-001-005.
文摘For the purpose of solving the shortcomings of low speed and high power consumption of asynchronous wrapper in conventional network on chips,this paper proposes a quasi delay-insensitive high-speed two-phase operation mode asynchronous wrapper.The metastable state in sampling data procedure can be avoided by detecting the write/read signal, which can be used to stop the clock.Empty/full level of the registers can be determined by detecting the pulse signal of the two-phase asynchronous register,and then control the wrapper to sample input/output data.Sender wrapper and receiver wrapper consist of C elements and threshold gates,which ensure the quasi delay-insensitive characteristics and enhance the robustness.Simulations under different technology corners are implemented based on SMIC 0.18μm standard CMOS. Sender wrapper and receiver wrapper allow synchronous modules to work at the speed of 3.08 GHz and 2.98 GHz respectively with average dynamic power consumption of 1.727 mW and 1.779 mW.Its advantages of high-throughput,low-power, scalability and robustness make it a viable option for high-speed low-power interconnection of network-on-chip.
基金supported by the National Natural Science Foundation of China under Grant Nos.60970002,60833004,60773146, and 60673145.
文摘As the number of cores in chip multiprocessors (CMPs) increases, cache coherence protocol has become a key issue in integration of chip multiprocessors. Supporting cache coherence protocol in large chip multiprocessors still faces three hurdles: design complexity, performance and scalability. This paper proposes Cache Coherent Network on Chip (CCNoC), a scheme that decouples cache coherency maintenance from processors and shared L2 caches and implements it completely in network on chip to free up processors and shared L2 caches from the chore of maintaining coherency, thereby reduces design complexity of CMPs. In this way, CCNoC also improves the performance of cache coherence protocol through reducing directory access latency and enhances scalability by avoiding massive directories overhead in shared L2 caches. In CCNoC, coherence state caches and active directory caches are implemented in the network interface components of network on chip to maintain cache coherence states for blocks in L1 caches and manage directory information for recently accessed blocks in L2 caches respectively. CCNoC provides a scalable CMP framework to tackle cache coherency which is the foundation of CMP. This paper evaluates the performance of CCNoC. Experimental results show that for a 16-core system, CCNoC improves performance by 3% on average over the conventional chip multiprocessor and by 10% at best, while reduces storage overhead by 1.8% and saves directory storage by 88%, showing good scalability.
文摘Network on chip(NoC)is an infrastructure providing a communication platform to multiprocessor chips.Furthermore,the wormhole-switching method,which shares resources,was used to increase its efficiency;however,this can lead to congestion.Moreover,dealing with this congestion consumes more energy and correspondingly leads to increase in power consumption.Furthermore,consuming more power results in more heat and increases thermal fluctuations that lessen the life span of the infrastructures and,more importantly,the network’s performance.Given these complications,providing a method that controls congestion is a significant design challenge.In this paper,a fuzzy logic congestion control routing algorithm is presented to enhance the NoC’s performance when facing congestion.To avoid congestion,the proposed algorithm employs the occupied input buffer and the total occupied buffers of the neighboring nodes along with the maximum possible path diversity with minimal path length from instant neighbors to the destination as the selection parameters.To enhance the path selection function,the uncertainty of the fuzzy logic algorithm is used.As a result,the average delay,power consumption,and maximum delay are reduced by 14.88%,7.98%,and 19.39%,respectively.Additionally,the proposed method enhances the throughput and the total number of packets received by 14.9%and 11.59%,respectively.To show the significance,the proposed algorithm is examined using transpose traffic patterns,and the average delay is improved by 15.3%.The average delay is reduced by 3.8%in TMPEG-4(treble MPEG-4),36.6%in QPIP(quadruplicate PIP),and 20.9%in TVOPD(treble VOPD).
文摘Occurrence of faults in Network on Chip(NoC) is inevitable as the feature size is con-tinuously decreasing and processing elements are increasing in numbers.Faults can be revocable if it is transient.Transient fault may occur inside router,or in the core or in communication wires.Examples of transient faults are overflow of buffers in router,clock skew,cross talk,etc..Revocation of transient faults can be done by retransmission of faulty packets using oblivious or adaptive routing algorithms.Irrevocable faults causes non-functionality of segment and mainly occurs during fabrication process.NoC reliability increases with the efficient routing algorithms,which can handle the maximum faults without deadlock in network.As transient faults are temporary and can be easily revoked using re-transmission of packet,permanent faults require efficient routing to route the packet by bypassing the nonfunctional segments.Thus,our focus is on the analysis of adaptive minimal path fault tolerant routing to handle the permanent faults.Comparative analysis between partial adaptive fault tolerance routing West-First,North-Last,Negative-First,Odd Even,and Minimal path Fault Tolerant routing(MinFT) algorithms with the nodes and links failure is performed using NoC Interconnect RoutinG and Application Modeling simulator(NIRGAM) for the 2D Mesh topology.Result suggests that MinFT ensures data transmission under worst conditions as compared to other adaptive routing algorithms.
文摘This paper presents the result of experiments conducted in mesh networks on different routing algorithms, traffic generation schemes and switching schemes. A new network on chip (NoC) topology based on partial interconnection of mesh network is proposed and a routing algorithm supporting the proposed architecture is developed. The proposed architecture is similar to standard mesh networks, where four extra bidirectional channels are added which remove the congestion and hotspots compared to standard mesh networks with fewer channels. Significant improvement in delay (60% reduction) and throughput (60% increase) was observed using the proposed network and routing when compared with the ideal mesh networks. An increase in number of channels makes the switches expensive and could increase the area and power consumption. However, the proposed network can be useful in high speed applications with some compromise on area and power.
基金the National Natural Science Foundation of China (No. 60273081)
文摘Network on chip (NoC) architectures have been proposed to resolve complex on-chip communication problems. An NoC-based mapping algorithm is shown in this paper. It can map irregular intellectual properties (IPs) cores onto regular tile 2-D mesh NoC architectures. The basic idea is to decompose a large IP into several dummy IPs or integrate several small IPs into one dummy IP, such that each dummy IP can fit into a single tile. It can also allocate buffer space according to the input/output degree and avoid connection congestion by adapting communication density. Experimental data indicate that using the algorithm proposed in this paper, the communication energy can be reduced about 7%.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61202076, 61202062.
文摘Computer architecture is transiting from the multicore era into the heterogeneous era in which heterogeneous architectures use on-chip networks to access shared resources and how a network is configured will likely have a significant impact on overall performance and power consumption. Recently, heterogeneous network on chip (NoC) has been proposed not only to achieve performance comparable to that of the NoCs with buffered routers but also to reduce buffer cost and energy consumption. However, heterogeneous NoC design for heterogeneous GPU-CPU architectures has not been studied in depth. This paper first evaluates the performance and power consumption of a variety of static hot-potato based heterogeneous NoCs with different buffered and bufferless router placements, which is helpful to explore the design space for heterogeneous GPU-CPU interconnection. Then it proposes Unidirectional Flow Control (UFC), a simple credit-based flow control mechanism for heterogeneous NoC in GPU-CPU architectures to control network congestion. UFC can guarantee that there are always unoccupied entries in buffered routers to receive flits coming from adjacent bufferless routers. Our evaluations show that when compared to hot-potato routing, UFC improves performance by an average of 14.1% with energy increased by an average of 5.3% only.