A Dominant Resource Fairness (DRF) based scheme for job scheduling in distributed cloud computing systems which was modeled as multi-job scheduling and multi-resource allocation coupling problem is proposed, where t...A Dominant Resource Fairness (DRF) based scheme for job scheduling in distributed cloud computing systems which was modeled as multi-job scheduling and multi-resource allocation coupling problem is proposed, where the resource pool is constructed from a large number of distributed heterogeneous servers, representing different points in the configuration space of resources such as processing, memory, storage and bandwidth. By introducing dominant resource share of jobs and virtual machines, the multi-job scheduling and multi-resource allocation joint mechanism significantly improves the cloud system's resource utilization, yet with a substantial reduction of job completion times. We show through experiments and case studies the superior performance of the algorithms in practice.展开更多
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the fi...The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively.展开更多
Transient effects of stress-strain fields in the vicinity of a stationary crack tip under high rate loads are discussed.Exact analytical solutions to near tip stresses are compared to fields prescribed by leading term...Transient effects of stress-strain fields in the vicinity of a stationary crack tip under high rate loads are discussed.Exact analytical solutions to near tip stresses are compared to fields prescribed by leading terms(one or several) of Williams asymptotic expansion.Influence of load application mode,time(or,which is the same,distance from a crack tip) and Poisson's ratio on this discrepancy is extensively examined.Some effects connected with crack tip propagation speed are also discussed.Significant inconsistencies between real(or received in numerical solutions of state equations-e.g.finite element computations) crack tip fields and stress intensity factor(SIF) singular field observed by numerous researchers are explained.The scope of problems where SIF field can be used for correct prediction of dynamic stress-strain fields in the crack tip region is established.Possibility to correctly approximate fields that are not SIF dominated,accounting additional terms of Williams expansion,is studied.展开更多
文摘A Dominant Resource Fairness (DRF) based scheme for job scheduling in distributed cloud computing systems which was modeled as multi-job scheduling and multi-resource allocation coupling problem is proposed, where the resource pool is constructed from a large number of distributed heterogeneous servers, representing different points in the configuration space of resources such as processing, memory, storage and bandwidth. By introducing dominant resource share of jobs and virtual machines, the multi-job scheduling and multi-resource allocation joint mechanism significantly improves the cloud system's resource utilization, yet with a substantial reduction of job completion times. We show through experiments and case studies the superior performance of the algorithms in practice.
基金Supported by the National Basic Research Program of China(No.2012CB316502)the National High Technology Research and DevelopmentProgram of China(No.2009AA01A129)the National Natural Science Foundation of China(No.60921002)
文摘The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively.
基金supported by RFBR research grants, Russian Federal programs and academic programs of the Russian Academy of Sciences
文摘Transient effects of stress-strain fields in the vicinity of a stationary crack tip under high rate loads are discussed.Exact analytical solutions to near tip stresses are compared to fields prescribed by leading terms(one or several) of Williams asymptotic expansion.Influence of load application mode,time(or,which is the same,distance from a crack tip) and Poisson's ratio on this discrepancy is extensively examined.Some effects connected with crack tip propagation speed are also discussed.Significant inconsistencies between real(or received in numerical solutions of state equations-e.g.finite element computations) crack tip fields and stress intensity factor(SIF) singular field observed by numerous researchers are explained.The scope of problems where SIF field can be used for correct prediction of dynamic stress-strain fields in the crack tip region is established.Possibility to correctly approximate fields that are not SIF dominated,accounting additional terms of Williams expansion,is studied.