High-performance computing(HPC)systems are about to reach a new height:exascale.Application deployment is becoming an increasingly prominent problem.Container technology solves the problems of encapsulation and migrat...High-performance computing(HPC)systems are about to reach a new height:exascale.Application deployment is becoming an increasingly prominent problem.Container technology solves the problems of encapsulation and migration of applications and their execution environment.However,the container image is too large,and deploying the image to a large number of compute nodes is time-consuming.Although the peer-to-peer(P2P)approach brings higher transmission efficiency,it introduces larger network load.All of these issues lead to high startup latency of the application.To solve these problems,we propose the topology-aware execution environment service(TEES)for fast and agile application deployment on HPC systems.TEES creates a more lightweight execution environment for users,and uses a more efficient topology-aware P2P approach to reduce deployment time.Combined with a split-step transport and launch-in-advance mechanism,TEES reduces application startup latency.In the Tianhe HPC system,TEES realizes the deployment and startup of a typical application on 17560 compute nodes within 3 s.Compared to container-based application deployment,the speed is increased by 12-fold,and the network load is reduced by 85%.展开更多
Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such a...Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such as artificial intelligence and big data.The standard preset environment can no longer meet these diverse requirements.If users still run these emerging applications on HPC systems,they need to manually maintain the specific dependencies(libraries,environment variables,and so on)of their applications.This increases the development and deployment burden for users.Moreover,the multi-user mode brings about privacy problems among users.Containers like Docker and Singularity can encapsulate the job’s execution environment,but in a highly customized HPC system,cross-environment application deployment of Docker and Singularity is limited.The introduction of container images also imposes a maintenance burden on system administrators.Facing the above-mentioned problems,in this paper we propose a self-deployed execution environment(SDEE)for HPC.SDEE combines the advantages of traditional virtualization and modern containers.SDEE provides an isolated and customizable environment(similar to a virtual machine)to the user.The user is the root user in this environment.The user develops and debugs the application and deploys its special dependencies in this environment.Then the user can load the job to compute nodes directly through the traditional HPC job management system.The job and its dependencies are analyzed,packaged,deployed,and executed automatically.This process enables transparent and rapid job deployment,which not only reduces the burden on users,but also protects user privacy.Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.展开更多
基金Project supported by the National Natural Science Foundation of China(No.61902405)the Tianhe Supercomputer Project of China(No.2018YFB0204301)+1 种基金the PDL Research Fund of China(No.6142110190404)the National High-Level Personnel for Defense Technology Program,China(No.2017-JCJQ-ZQ-013)。
文摘High-performance computing(HPC)systems are about to reach a new height:exascale.Application deployment is becoming an increasingly prominent problem.Container technology solves the problems of encapsulation and migration of applications and their execution environment.However,the container image is too large,and deploying the image to a large number of compute nodes is time-consuming.Although the peer-to-peer(P2P)approach brings higher transmission efficiency,it introduces larger network load.All of these issues lead to high startup latency of the application.To solve these problems,we propose the topology-aware execution environment service(TEES)for fast and agile application deployment on HPC systems.TEES creates a more lightweight execution environment for users,and uses a more efficient topology-aware P2P approach to reduce deployment time.Combined with a split-step transport and launch-in-advance mechanism,TEES reduces application startup latency.In the Tianhe HPC system,TEES realizes the deployment and startup of a typical application on 17560 compute nodes within 3 s.Compared to container-based application deployment,the speed is increased by 12-fold,and the network load is reduced by 85%.
基金the Tianhe Supercomputer Project(No.2018YFB0204301)the National Natural Science Foundation of China(No.61902405)+1 种基金the PDL Research Fund(No.6142110190404)the National High-Level Personnel for Defense Technology Program(No.2017-JCJQ-ZQ-013)。
文摘Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such as artificial intelligence and big data.The standard preset environment can no longer meet these diverse requirements.If users still run these emerging applications on HPC systems,they need to manually maintain the specific dependencies(libraries,environment variables,and so on)of their applications.This increases the development and deployment burden for users.Moreover,the multi-user mode brings about privacy problems among users.Containers like Docker and Singularity can encapsulate the job’s execution environment,but in a highly customized HPC system,cross-environment application deployment of Docker and Singularity is limited.The introduction of container images also imposes a maintenance burden on system administrators.Facing the above-mentioned problems,in this paper we propose a self-deployed execution environment(SDEE)for HPC.SDEE combines the advantages of traditional virtualization and modern containers.SDEE provides an isolated and customizable environment(similar to a virtual machine)to the user.The user is the root user in this environment.The user develops and debugs the application and deploys its special dependencies in this environment.Then the user can load the job to compute nodes directly through the traditional HPC job management system.The job and its dependencies are analyzed,packaged,deployed,and executed automatically.This process enables transparent and rapid job deployment,which not only reduces the burden on users,but also protects user privacy.Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.