期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Notes of Observations at the General Election in the United Sates
1
作者 Kuang Zheng 《International Understanding》 1997年第1期12-15,共4页
NotesofObservationsattheGeneralElectionintheUnitedSatesKuangZhengAttheinvitationoftheAmericanCouncilofYoungP... NotesofObservationsattheGeneralElectionintheUnitedSatesKuangZhengAttheinvitationoftheAmericanCouncilofYoungPoliticalLeaders,I... 展开更多
关键词 Notes of Observations at the general Election in the united Sates
下载PDF
SOLVERS FOR SYSTEMS OF LARGE SPARSE LINEAR AND NONLINEAR EQUATIONS BASED ON MULTI-GPUS 被引量:3
2
作者 刘沙 钟诚文 陈效鹏 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2011年第3期300-308,共9页
Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremend... Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need. 展开更多
关键词 general purpose graphic process unit(GPGPU) compute unified device architecture(CUDA) system of linear equations system of nonlinear equations Inexact Newton method bi-conjugate gradient stable(Bi-CGstab)method
下载PDF
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
3
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(GPGPU) MULTICORE intra-kernel inter-kernel parallel
原文传递
Single-particle 3D reconstruction on specialized stream architecture and comparison with GPGPUs
4
作者 段勃 Wang Wendi +1 位作者 Tan Guangming Meng Dan 《High Technology Letters》 EI CAS 2014年第4期333-345,共13页
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the fi... The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively. 展开更多
关键词 Stream architecture general purpose graphic processing unit GPGPU) field programmable gate array (FPGA) CRYO-EM
下载PDF
Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures
5
作者 Kuang Ma Maoqiang Jiang Zhaohui Liu 《Particuology》 SCIE EI CAS CSCD 2023年第10期25-37,共13页
An efficient computing framework,namely PFlows,for fully resolved-direct numerical simulations of particle-laden flows was accelerated on NVIDIA General Processing Units(GPUs)and GPU-like accelerator(DCU)cards.The fra... An efficient computing framework,namely PFlows,for fully resolved-direct numerical simulations of particle-laden flows was accelerated on NVIDIA General Processing Units(GPUs)and GPU-like accelerator(DCU)cards.The framework is featured as coupling the lattice Boltzmann method for fluid flow with the immersed boundary method for fluid-particle interaction,and the discrete element method for particle collision,using two fixed Eulerian meshes and one moved Lagrangian point mesh,respectively.All the parts are accelerated by a fine-grained parallelism technique using CUDA on GPUs,and further using HIP on DCU cards,i.e.,the calculation on each fluid grid,each immersed boundary point,each particle motion,and each pair-particle collision is responsible by one computer thread,respectively.Coalesced memory accesses to LBM distribution functions with the data layout of Structure of Arrays are used to maximize utilization of hardware bandwidth.Parallel reduction with shared memory for data of immersed boundary points is adopted for the sake of reducing access to global memory when integrate particle hydrodynamic force.MPI computing is further used for computing on heterogeneous architectures with multiple CPUs-GPUs/DCUs.The communications between adjacent processors are hidden by overlapping with calculations.Two benchmark cases were conducted for code validation,including a pure fluid flow and a particle-laden flow.The performances on a single accelerator show that a GPU V100 can achieve 7.1–11.1 times speed up,while a single DCU can achieve 5.6–8.8 times speed up compared to a single Xeon CPU chip(32 cores).The performances on multi-accelerators show that parallel efficiency is 0.5–0.8 for weak scaling and 0.68–0.9 for strong scaling on up to 64 DCU cards even for the dense flow(φ=20%).The peak performance reaches 179 giga lattice updates per second(GLUPS)on 256 DCU cards by using 1 billion grids and 1 million particles.At last,a large-scale simulation of a gas-solid flow with 1.6 billion grids and 1.6 million particles was conducted using only 32 DCU cards.This simulation shows that the present framework is prospective for simulations of large-scale particle-laden flows in the upcoming exascale computing era. 展开更多
关键词 Lattice Boltzmann method Immersed boundary method Particle-ladenflows Heterogeneous acceleration general Processing units
原文传递
Experimental study on performance of pneumatic seeding system 被引量:7
6
作者 Liu Lijing Yang Hui Ma Shaochun 《International Journal of Agricultural and Biological Engineering》 SCIE EI CAS 2016年第6期84-90,共7页
The purpose of this study was to promote the development of large-scale agricultural machines in China and meet the demand of air seeder localization.This study investigated the relationship between the working parame... The purpose of this study was to promote the development of large-scale agricultural machines in China and meet the demand of air seeder localization.This study investigated the relationship between the working parameters and the performance of pneumatic seeding system,Cangmai 6004 wheat seed was used.A test platform for pneumatic seeding systems was developed,and then a series of experiments were performed based on the quadratic general rotary unitized design and response surface methodology(RSM).The seeding rate and the air flow rate were selected as affecting factors,coefficient of variation(CV)of evenness of feeding rate between rows and CV of seeding stability of total rows were assigned as the test indexes.Regression models between factors and indexes were established,and finally,an optimal equation based on this pneumatic seeding system were established as well,which can determine the proper air flow rate once the seeding rate was set based on the practical agronomic requirements.For example,when the seeding rate is set as 250 kg/hm^(2),the proper air flow rate of 7.53 m3/min can be obtained.The verification experiment results showed that the predicted working parameters obtained by RSM were feasible,which might provide a theoretical basis for further research of pneumatic seed metering systems. 展开更多
关键词 pneumatic seeding system seeding rate air flow rate quadratic general rotary unitized design regression model
原文传递
Optimization of working parameters for 3MGY-200 axial air-assisted sprayer in kiwifruit orchards 被引量:2
7
作者 Chenchen Gu Zhijie Liu +2 位作者 Guanting Pan Yingjun Pu Fuzeng Yang 《International Journal of Agricultural and Biological Engineering》 SCIE EI CAS 2020年第2期81-91,共11页
Axial air-assisted sprayers can distribute pesticides efficiently in kiwifruit orchards.Because of improper parameter settings,most sprayers deliver either too much or too little pesticide.To identify appropriate spra... Axial air-assisted sprayers can distribute pesticides efficiently in kiwifruit orchards.Because of improper parameter settings,most sprayers deliver either too much or too little pesticide.To identify appropriate sprayer parameters for kiwifruit trees,the vertical distribution profiles of the applied liquid spray were examined in this study.The effects of spray fan speed(SFS),spray pressure(SP)and spray distance(SD)on the distributions of the sprayed liquid in the vertical profiles were studied.Combined actions of the above parameters were systematically analysed using the quadratic general rotary design test method.Regression equations for the spray liquid distributions and working factors are presented.Field confirmation experiments were carried out to optimize the parameters.Data analysis showed that the optional sprayer working parameters are those of Group 3,with an SFS equal to 1900 r/min and SP equal to 3.25 MPa.The results of this study provide a reference for future applications of this type of axial air-assisted sprayer in kiwifruit orchards. 展开更多
关键词 sprayer parameters quadratic general rotary unitized design regression equation OPTIMIZATION kiwifruit tree
原文传递
A multi-scale architecture for multi-scale simulation and its application to gas-solid flows 被引量:1
8
作者 Bo Li Guofeng Zhou +4 位作者 Wei Ge Limin Wang Xiaowei Wang Li Guo Jinghai Li 《Particuology》 SCIE EI CAS CSCD 2014年第4期160-169,共10页
A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpos... A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpose CPUs are employed for macro-scale control and optimization, and many integrated cores (MlCs) operating in multiple-instruction multiple-data mode are used for a molecular dynamics simulation of the solid particles at the meso-scale. Many cores operating in single-instruction multiple- data mode, such as general purpose graphics processing units (GPGPUs), are employed for direct numerical simulation of the fluid flow at the micro-scale using the lattice Boltzmann method. This architecture is also expected to be efficient for the multi-scale simulation of other comolex systems. 展开更多
关键词 general purpose graphics processing unit(GPGPU)Many integrated core (MIC)Meso-science Multiple-instruction multiple-dataSingle-instruction multiple-dataVirtual process engineering
原文传递
Optimizing non-coalesced memory access for irregular applications with GPU computing
9
作者 Ran ZHENG Yuan-dong LIU Hai JIN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第9期1285-1301,共17页
General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graph... General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graphics processing units(GPUs)are less substantial for irregular applications.In recent years,several studies have presented some solutions to remove static irregular memory access.However,eliminating dynamic irregular memory access with software remains a serious challenge.A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access,especially for indirect memory access.Data reordering and index redirection are suggested to reduce the number of memory transactions,thereby improving the performance of GPU kernels.To improve the efficiency of data reordering,an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data.Through concurrently executing the compute unified device architecture(CUDA)streams of data reordering and the data processing kernel,the overhead of data reordering can be reduced.After these optimizations,the volume of memory transactions can be reduced by 16.7%-50%compared with CUSPARSE-based benchmarks,and the performance of irregular kernels can be improved by 9.64%-34.9%using an NVIDIA Tesla P4 GPU. 展开更多
关键词 general purpose graphics processing units Memory coalescing Non-coalesced memory access Data reordering
原文传递
GPGPU Accelerated Fast Convolution Back-Projection for Radar Image Reconstruction
10
作者 周斌 彭应宁 +1 位作者 叶春茂 汤俊 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第3期256-263,共8页
This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the pro... This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the processing. The implementation achieves much better performance than conventional processing systems, with a speedup of more than 890 times on NVIDIA Tesla C1060 supercomputing cards compared to an Intel P4 2.4 GHz CPU. 256×256 pixel images could be reconstructed within 6.3 s, which makes real-time imaging possible. Six platforms were tested and compared. The results show that the GPGPU super-computing system has great potential for radar image processing. 展开更多
关键词 convolution back projection (CBP) synthetic aperture radar (SAR) inverse synthetic aperture radar (ISAR) general purpose graphic processing units (GPGPU)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部