An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main pr...An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.展开更多
Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune...Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system. 展开更多
Frequency tunability has become a subject of concern in the field of high-power microwave(HPM) source research.However, little information about the corresponding mode converter is available. A tunable circularly-po...Frequency tunability has become a subject of concern in the field of high-power microwave(HPM) source research.However, little information about the corresponding mode converter is available. A tunable circularly-polarized turnstilejunction mode converter(TCTMC) for high-power microwave applications is presented in this paper. The input coaxial TEM mode is transformed into TE(10) mode with different phase delays in four rectangular waveguides and then converted into a circularly-polarized TE(11) circular waveguide mode. Besides, the rods are added to reduce or even eliminate the reflection. The innovations in this study are as follows. The tunning mechanism is added to the mode converter, which can change the effective length of rectangular waveguide and the distance between the rods installed upstream and the closest edge of the rectangular waveguide, thus improving the conversion efficiency and bandwidth. The conversion efficiency of TCTMC can reach above 98% over the frequency range of 1.42 GHz–2.29 GHz, and the frequency tunning bandwidth is about 47%. Significantly, TCTMC can obtain continuous high conversion efficiency of different frequency points with the change of tuning mechanism.展开更多
More and more embedded systems now support multimedia networking on the Ethernet or using the Wireless LAN (WLAN) technologies. An embedded system, typically designed with a low-performance microprocessor in order t...More and more embedded systems now support multimedia networking on the Ethernet or using the Wireless LAN (WLAN) technologies. An embedded system, typically designed with a low-performance microprocessor in order to reduce both power usage and cost, often shows poor performance on multimedia networking, This paper describes a case study of improving the TCP/IP networking performance of a real-world embedded uClinux multimedia system, which is configured with both a fast Ethernet and a Wi-Fi connection. This paper analyzes networking overhead of the embedded system, and provides specific methods to improve its networking performance based upon the analysis, Our benchmark results indicate that these methods can improve the multimedia networking throughput on the embedded system by about 1596 .展开更多
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computi...It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.展开更多
基金Project (50371026) supported by the National Natural Science Foundation of China
文摘An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.
文摘Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.
基金supported by the National Natural Science Foundation of China(Grant No.61671457)
文摘Frequency tunability has become a subject of concern in the field of high-power microwave(HPM) source research.However, little information about the corresponding mode converter is available. A tunable circularly-polarized turnstilejunction mode converter(TCTMC) for high-power microwave applications is presented in this paper. The input coaxial TEM mode is transformed into TE(10) mode with different phase delays in four rectangular waveguides and then converted into a circularly-polarized TE(11) circular waveguide mode. Besides, the rods are added to reduce or even eliminate the reflection. The innovations in this study are as follows. The tunning mechanism is added to the mode converter, which can change the effective length of rectangular waveguide and the distance between the rods installed upstream and the closest edge of the rectangular waveguide, thus improving the conversion efficiency and bandwidth. The conversion efficiency of TCTMC can reach above 98% over the frequency range of 1.42 GHz–2.29 GHz, and the frequency tunning bandwidth is about 47%. Significantly, TCTMC can obtain continuous high conversion efficiency of different frequency points with the change of tuning mechanism.
文摘More and more embedded systems now support multimedia networking on the Ethernet or using the Wireless LAN (WLAN) technologies. An embedded system, typically designed with a low-performance microprocessor in order to reduce both power usage and cost, often shows poor performance on multimedia networking, This paper describes a case study of improving the TCP/IP networking performance of a real-world embedded uClinux multimedia system, which is configured with both a fast Ethernet and a Wi-Fi connection. This paper analyzes networking overhead of the embedded system, and provides specific methods to improve its networking performance based upon the analysis, Our benchmark results indicate that these methods can improve the multimedia networking throughput on the embedded system by about 1596 .
文摘It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.