This paper investigates the fundamental data detection problem with burst interference in massive multiple-input multiple-output orthogonal frequency division multiplexing(MIMO-OFDM) systems. In particular, burst inte...This paper investigates the fundamental data detection problem with burst interference in massive multiple-input multiple-output orthogonal frequency division multiplexing(MIMO-OFDM) systems. In particular, burst interference may occur only on data symbols but not on pilot symbols, which means that interference information cannot be premeasured. To cancel the burst interference, we first revisit the uplink multi-user system and develop a matrixform system model, where the covariance pattern and the low-rank property of the interference matrix is discussed. Then, we propose a turbo message passing based burst interference cancellation(TMP-BIC) algorithm to solve the data detection problem, where the constellation information of target data is fully exploited to refine its estimate. Furthermore, in the TMP-BIC algorithm, we design one module to cope with the interference matrix by exploiting its lowrank property. Numerical results demonstrate that the proposed algorithm can effectively mitigate the adverse effects of burst interference and approach the interference-free bound.展开更多
Compressed sensing(CS)aims for seeking appropriate algorithms to recover a sparse vector from noisy linear observations.Currently,various Bayesian-based algorithms such as sparse Bayesian learning(SBL)and approximate ...Compressed sensing(CS)aims for seeking appropriate algorithms to recover a sparse vector from noisy linear observations.Currently,various Bayesian-based algorithms such as sparse Bayesian learning(SBL)and approximate message passing(AMP)based algorithms have been proposed.For SBL,it has accurate performance with robustness while its computational complexity is high due to matrix inversion.For AMP,its performance is guaranteed by the severe restriction of the measurement matrix,which limits its application in solving CS problem.To overcome the drawbacks of the above algorithms,in this paper,we present a low complexity algorithm for the single linear model that incorporates the vector AMP(VAMP)into the SBL structure with expectation maximization(EM).Specifically,we apply the variance auto-tuning into the VAMP to implement the E step in SBL,which decrease the iterations that require to converge compared with VAMP-EM algorithm when using a Gaussian mixture(GM)prior.Simulation results show that the proposed algorithm has better performance with high robustness under various cases of difficult measurement matrices.展开更多
To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passi...To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passing(AMP) is adopted. AMP exploits the difference between speech and noise sparsity to remove or mute the noise from the corrupted speech. The AMP algorithm is adopted to reconstruct the clean speech efficiently for speech enhancement. More specifically, the prior probability distribution of speech sparsity coefficient is characterized by Gaussian-model, and the hyper-parameters of the prior model are excellently learned by expectation maximization(EM) algorithm. We utilize the k-nearest neighbor(k-NN) algorithm to learn the sparsity with the fact that the speech coefficients between adjacent frames are correlated. In addition, computational simulations are used to validate the proposed algorithm, which achieves better speech enhancement performance than other four baseline methods-Wiener filtering, subspace pursuit(SP), distributed sparsity adaptive matching pursuit(DSAMP), and expectation-maximization Gaussian-model approximate message passing(EM-GAMP) under different compression ratios and a wide range of signal to noise ratios(SNRs).展开更多
The orthogonal time frequency space(OTFS)modulation has emerged as a promis⁃ing modulation scheme for wireless communications in high-mobility scenarios.An efficient detector is of paramount importance to harvesting t...The orthogonal time frequency space(OTFS)modulation has emerged as a promis⁃ing modulation scheme for wireless communications in high-mobility scenarios.An efficient detector is of paramount importance to harvesting the time and frequency diversities promised by OTFS.Recently,some message passing based detectors have been developed by exploiting the features of the OTFS channel matrices.In this paper,we provide an overview of some re⁃cent message passing based OTFS detectors,compare their performance,and shed some light on potential research on the design of message passing based OTFS receivers.展开更多
Message passing algorithms,whose iterative nature captures complicated interactions among interconnected variables in complex systems and extracts information from the fixed point of iterated messages,provide a powerf...Message passing algorithms,whose iterative nature captures complicated interactions among interconnected variables in complex systems and extracts information from the fixed point of iterated messages,provide a powerful toolkit in tackling hard computational tasks in optimization,inference,and learning problems.In the context of constraint satisfaction problems(CSPs),when a control parameter(such as constraint density)is tuned,multiple threshold phenomena emerge,signaling fundamental structural transitions in their solution space.Finding solutions around these transition points is exceedingly challenging for algorithm design,where message passing algorithms suffer from a large message fiuctuation far from convergence.Here we introduce a residual-based updating step into message passing algorithms,in which messages with large variation between consecutive steps are given high priority in the updating process.For the specific example of model RB(revised B),a typical prototype of random CSPs with growing domains,we show that our algorithm improves the convergence of message updating and increases the success probability in finding solutions around the satisfiability threshold with a low computational cost.Our approach to message passing algorithms should be of value for exploring their power in developing algorithms to find ground-state solutions and understand the detailed structure of solution space of hard optimization problems.展开更多
Graph Convolutional Neural Networks(GCNs)have been widely used in various fields due to their powerful capabilities in processing graph-structured data.However,GCNs encounter significant challenges when applied to sca...Graph Convolutional Neural Networks(GCNs)have been widely used in various fields due to their powerful capabilities in processing graph-structured data.However,GCNs encounter significant challenges when applied to scale-free graphs with power-law distributions,resulting in substantial distortions.Moreover,most of the existing GCN models are shallow structures,which restricts their ability to capture dependencies among distant nodes and more refined high-order node features in scale-free graphs with hierarchical structures.To more broadly and precisely apply GCNs to real-world graphs exhibiting scale-free or hierarchical structures and utilize multi-level aggregation of GCNs for capturing high-level information in local representations,we propose the Hyperbolic Deep Graph Convolutional Neural Network(HDGCNN),an end-to-end deep graph representation learning framework that can map scale-free graphs from Euclidean space to hyperbolic space.In HDGCNN,we define the fundamental operations of deep graph convolutional neural networks in hyperbolic space.Additionally,we introduce a hyperbolic feature transformation method based on identity mapping and a dense connection scheme based on a novel non-local message passing framework.In addition,we present a neighborhood aggregation method that combines initial structural featureswith hyperbolic attention coefficients.Through the above methods,HDGCNN effectively leverages both the structural features and node features of graph data,enabling enhanced exploration of non-local structural features and more refined node features in scale-free or hierarchical graphs.Experimental results demonstrate that HDGCNN achieves remarkable performance improvements over state-ofthe-art GCNs in node classification and link prediction tasks,even when utilizing low-dimensional embedding representations.Furthermore,when compared to shallow hyperbolic graph convolutional neural network models,HDGCNN exhibits notable advantages and performance enhancements.展开更多
The extra-large scale multiple-input multiple-output(XL-MIMO)for the beyond fifth/sixth generation mobile communications is a promising technology to provide Tbps data transmission and stable access service.However,th...The extra-large scale multiple-input multiple-output(XL-MIMO)for the beyond fifth/sixth generation mobile communications is a promising technology to provide Tbps data transmission and stable access service.However,the extremely large antenna array aperture arouses the channel near-field effect,resulting in the deteriorated data rate and other challenges in the practice communication systems.Meanwhile,multi-panel MIMO technology has attracted extensive attention due to its flexible configuration,low hardware cost,and wider coverage.By combining the XL-MIMO and multi-panel array structure,we construct multi-panel XL-MIMO and apply it to massive Internet of Things(IoT)access.First,we model the multi-panel XL-MIMO-based near-field channels for massive IoT access scenarios,where the electromagnetic waves corresponding to different panels have different angles of arrival/departure(AoAs/AoDs).Then,by exploiting the sparsity of the near-field massive IoT access channels,we formulate a compressed sensing based joint active user detection(AUD)and channel estimation(CE)problem which is solved by AMP-EM-MMV algorithm.The simulation results exhibit the superiority of the AMP-EM-MMV based joint AUD and CE scheme over the baseline algorithms.展开更多
Orthogonal Time Frequency Space(OTFS)signaling with index modulation(IM)is a promising transmission scheme characterized by high transmission efficiency for high mobility scenarios.In this paper,we study the receiver ...Orthogonal Time Frequency Space(OTFS)signaling with index modulation(IM)is a promising transmission scheme characterized by high transmission efficiency for high mobility scenarios.In this paper,we study the receiver for coded OTFS-IM system.First,we construct the corresponding factor graph,on which the structured prior incorporating activation pattern constraint and channel coding is devised.Then we develop a iterative receiver via structured prior-based hybrid belief propagation(BP)and expectation propagation(EP)algorithm,named as StrBP-EP,for the coded OTFS-IM system.To reduce the computational complexity of discrete distribution introduced by structured prior,Gaussian approximation conducted by EP is adopted.To further reduce the complexity,we derive two variations of the proposed algorithm by using some approximations.Simulation results validate the superior performance of the proposed algorithm.展开更多
In this paper,a powerful model-driven deep learning framework is exploited to overcome the challenge of multi-domain signal detection in spacedomain index modulation(SDIM)based multiple input multiple output(MIMO)syst...In this paper,a powerful model-driven deep learning framework is exploited to overcome the challenge of multi-domain signal detection in spacedomain index modulation(SDIM)based multiple input multiple output(MIMO)systems.Specifically,we use orthogonal approximate message passing(OAMP)technique to develop OAMPNet,which is a novel signal recovery mechanism in the field of compressed sensing that effectively uses the sparse property from the training SDIM samples.For OAMPNet,the prior probability of the transmit signal has a significant impact on the obtainable performance.For this reason,in our design,we first derive the prior probability of transmitting signals on each antenna for SDIMMIMO systems,which is different from the conventional massive MIMO systems.Then,for massive MIMO scenarios,we propose two novel algorithms to avoid pre-storing all active antenna combinations,thus considerably improving the memory efficiency and reducing the related overhead.Our simulation results show that the proposed framework outperforms the conventional optimization-driven based detection algorithms and has strong robustness under different antenna scales.展开更多
Orthogonal time frequency space(OTFS)technique,which modulates data symbols in the delay-Doppler(DD)domain,presents a potential solution for supporting reliable information transmission in highmobility vehicular netwo...Orthogonal time frequency space(OTFS)technique,which modulates data symbols in the delay-Doppler(DD)domain,presents a potential solution for supporting reliable information transmission in highmobility vehicular networks.In this paper,we study the issues of DD channel estimation for OTFS in the presence of fractional Doppler.We first propose a channel estimation algorithm with both low complexity and high accuracy based on the unitary approximate message passing(UAMP),which exploits the structured sparsity of the effective DD domain channel using hidden Markov model(HMM).The empirical state evolution(SE)analysis is then leveraged to predict the performance of our proposed algorithm.To refine the hyperparameters in the proposed algorithm,we derive the update criterion for the hyperparameters through the expectation-maximization(EM)algorithm.Finally,Our simulation results demonstrate that our proposed algorithm can achieve a significant gain over various baseline schemes.展开更多
The newly emerging orthogonal time frequency space(OTFS)modulation can ob⁃tain delay-Doppler diversity gain to significantly improve the system performance in high mobility wireless communication scenarios such as veh...The newly emerging orthogonal time frequency space(OTFS)modulation can ob⁃tain delay-Doppler diversity gain to significantly improve the system performance in high mobility wireless communication scenarios such as vehicle-to-everything(V2X),high-speed railway and unmanned aerial vehicles(UAV),by employing inverse symplectic finite Fouri⁃er transform(ISFFT)and symplectic finite Fourier transform(SFFT).However,OTFS modu⁃lation will dramatically increase system complexity,especially at the receiver side.Thus,de⁃signing low complexity OTFS receiver is a key issue for OTFS modulation to be adopted by new-generation wireless communication systems.In this paper,we review low complexity OTFS detectors and provide some insights on future researches.We firstly present the OTFS system model and basic principles,followed by an overview of OTFS detector structures,classifications and comparative discussion.We also survey the principles of OTFS detection algorithms.Furthermore,we discuss the design of hybrid OTFS and orthogonal frequency di⁃vision multiplexing(OFDM)detectors in single user and multi-user multi-waveform commu⁃nication systems.Finally,we address the main challenges in designing low complexity OT⁃FS detectors and identify some future research directions.展开更多
We present a time domain hybrid method to realize the fast coupling analysis of transmission lines excited by space electromagnetic fields, in which parallel finite-difference time-domain (FDTD) method, interpolation ...We present a time domain hybrid method to realize the fast coupling analysis of transmission lines excited by space electromagnetic fields, in which parallel finite-difference time-domain (FDTD) method, interpolation scheme, and Agrawal model-based transmission line (TL) equations are organically integrated together. Specifically, the Agrawal model is employed to establish the TL equations to describe the coupling effects of space electromagnetic fields on transmission lines. Then, the excitation fields functioning as distribution sources in TL equations are calculated by the parallel FDTD method through using the message passing interface (MPI) library scheme and interpolation scheme. Finally, the TL equations are discretized by the central difference scheme of FDTD and assigned to multiple processors to obtain the transient responses on the terminal loads of these lines. The significant feature of the presented method is embodied in its parallel and synchronous calculations of the space electromagnetic fields and transient responses on the lines. Numerical simulations of ambient wave acting on multi-conductor transmission lines (MTLs), which are located on the PEC ground and in the shielded cavity respectively, are implemented to verify the accuracy and efficiency of the presented method.展开更多
Explosion and shock often involve large deformation, interface treatment between multi-material, and strong discontinuity. The Eulerian method has advantages for solving these problems. In parallel computation of the ...Explosion and shock often involve large deformation, interface treatment between multi-material, and strong discontinuity. The Eulerian method has advantages for solving these problems. In parallel computation of the Eulerian method, the physical quantities of the computaional cells do not change before the disturbance reaches to these cells. Computational efficiency is low when using fixed partition because of load imbalance. To solve this problem, a dynamic parallel method in which the computation domain expands with disturbance is used. The dynamic parallel program is designed based on the generally used message passing interface model. The numerical test of dynamic parallel program agrees well with that of the original parallel program, also agrees with the actual situation.展开更多
We propose an improved real-space parallel strategy for the density matrix renormalization group(DMRG)method,where boundaries of separate regions are adaptively distributed during DMRG sweeps.Our scheme greatly improv...We propose an improved real-space parallel strategy for the density matrix renormalization group(DMRG)method,where boundaries of separate regions are adaptively distributed during DMRG sweeps.Our scheme greatly improves the parallel efficiency with shorter waiting time between two adjacent tasks,compared with the original real-space parallel DMRG with fixed boundaries.We implement our new strategy based on the message passing interface(MPI),and dynamically control the number of kept states according to the truncation error in each DMRG step.We study the performance of the new parallel strategy by calculating the ground state of a spin-cluster chain and a quantum chemical Hamiltonian of the water molecule.The maximum parallel efficiencies for these two models are 91%and 76%in 4 nodes,which are much higher than the real-space parallel DMRG with fixed boundaries.展开更多
Low-complexity detectors play an essential role in massive multiple-input multiple-output (MIMO) transmissions. In this work, we discuss the perspectives of utilizing approximate message passing (AMP) algorithm to the...Low-complexity detectors play an essential role in massive multiple-input multiple-output (MIMO) transmissions. In this work, we discuss the perspectives of utilizing approximate message passing (AMP) algorithm to the detection of massive MIMO transmission. To this end, we need to efficiently reduce the divergence occurrence in AMP iterations and bridge the performance gap that AMP has from the optimum detector while making use of its advantage of low computational load. Our solution is to build a neural network to learn and optimize AMP detection with four groups of specifically designed learnable coefficients such that divergence rate and detection mean squared error (MSE) can be significantly reduced. Moreover, the proposed deep learning-based AMP has a much faster converging rate, and thus a much lower computational complexity than conventional AMP, providing an alternative solution for the massive MIMO detection. Extensive simulation experiments are provided to validate the advantages of the proposed deep learning-based AMP.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems.In particular,it is the interconnect empowering the first ex...The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems.In particular,it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world,Frontier.It offers various features such as adaptive routing,congestion control,and isolated workloads.The deployment of newer interconnects sparks interest related to performance,scalability,and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems.In this paper,we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI(message passing interface)libraries.In particular,we look at the scalability performance when using Slingshot across nodes.We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH,Open-MPI+UCX,RCCL,and MVAPICH2 on CPUs and GPUs on the Spock system,an early access cluster deployed with Slingshot-10,AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system.We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.展开更多
It has always been difficult to achieve accurate information of the channel for underwater acoustic communications because of the severe underwater propagation conditions,including frequency-selective property,high re...It has always been difficult to achieve accurate information of the channel for underwater acoustic communications because of the severe underwater propagation conditions,including frequency-selective property,high relative mobility,long propagation latency,and intensive ambient noise,etc.To this end,a deep unfolding neural network based approach is proposed,in which multiple layers of the network mimic the iterations of the classical iterative sparse approximation algorithm to extract the inherent sparse features of the channel by exploiting deep learning,and a scheme based on the Sparsity-Aware DNN(SA-DNN)for UAC estimation is proposed to improve the estimation accuracy.Moreover,we propose a Denoising Sparsity-Aware DNN(DeSA-DNN)based enhanced method that integrates a denoising CNN module in the sparsity-aware deep network,so that the degradation brought by intensive ambient noise could be eliminated and the estimation accuracy can be further improved.Simulation results demonstrate that the performance of the proposed schemes is superior to the state-of-the-art compressed sensing based and iterative sparse recovery schems in the aspects of channel recovery precision,pilot overhead,and robustness,particularly under unideal circumstances of intensive ambient noise or inadequate measurement pilots.展开更多
MODerate resolution atmospheric TRANsmission(MODTRAN)is a commercial remote sensing(RS)software package that has been widely used to simulate radiative transfer of electromagnetic radiation through the Earth’s atmosp...MODerate resolution atmospheric TRANsmission(MODTRAN)is a commercial remote sensing(RS)software package that has been widely used to simulate radiative transfer of electromagnetic radiation through the Earth’s atmosphere and the radiation observed by a remote sensor.However,when very large RS datasets must be processed in simulation applications at a global scale,it is extremely time-consuming to operate MODTRAN on a modern workstation.Under this circumstance,the use of parallel cluster computing to speed up the process becomes vital to this time-consuming task.This paper presents PMODTRAN,an implementation of a parallel task-scheduling algorithm based on MODTRAN.PMODTRAN was able to reduce the processing time of the test cases used here from over 4.4 months on a workstation to less than a week on a local computer cluster.In addition,PMODTRAN can distribute tasks with different levels of granularity and has some extra features,such as dynamic load balancing and parameter checking.展开更多
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Reg...A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.展开更多
基金supported by the National Key Laboratory of Wireless Communications Foundation,China (IFN20230204)。
文摘This paper investigates the fundamental data detection problem with burst interference in massive multiple-input multiple-output orthogonal frequency division multiplexing(MIMO-OFDM) systems. In particular, burst interference may occur only on data symbols but not on pilot symbols, which means that interference information cannot be premeasured. To cancel the burst interference, we first revisit the uplink multi-user system and develop a matrixform system model, where the covariance pattern and the low-rank property of the interference matrix is discussed. Then, we propose a turbo message passing based burst interference cancellation(TMP-BIC) algorithm to solve the data detection problem, where the constellation information of target data is fully exploited to refine its estimate. Furthermore, in the TMP-BIC algorithm, we design one module to cope with the interference matrix by exploiting its lowrank property. Numerical results demonstrate that the proposed algorithm can effectively mitigate the adverse effects of burst interference and approach the interference-free bound.
基金supported by NSFC projects(61960206005,61803211,61871111,62101275,62171127,61971136,and 62001056)Jiangsu NSF project(BK20200820)+1 种基金Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX210106)Research Fund of National Mobile Communications Research Laboratory.
文摘Compressed sensing(CS)aims for seeking appropriate algorithms to recover a sparse vector from noisy linear observations.Currently,various Bayesian-based algorithms such as sparse Bayesian learning(SBL)and approximate message passing(AMP)based algorithms have been proposed.For SBL,it has accurate performance with robustness while its computational complexity is high due to matrix inversion.For AMP,its performance is guaranteed by the severe restriction of the measurement matrix,which limits its application in solving CS problem.To overcome the drawbacks of the above algorithms,in this paper,we present a low complexity algorithm for the single linear model that incorporates the vector AMP(VAMP)into the SBL structure with expectation maximization(EM).Specifically,we apply the variance auto-tuning into the VAMP to implement the E step in SBL,which decrease the iterations that require to converge compared with VAMP-EM algorithm when using a Gaussian mixture(GM)prior.Simulation results show that the proposed algorithm has better performance with high robustness under various cases of difficult measurement matrices.
基金supported by National Natural Science Foundation of China(NSFC)(No.61671075)Major Program of National Natural Science Foundation of China(No.61631003)。
文摘To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passing(AMP) is adopted. AMP exploits the difference between speech and noise sparsity to remove or mute the noise from the corrupted speech. The AMP algorithm is adopted to reconstruct the clean speech efficiently for speech enhancement. More specifically, the prior probability distribution of speech sparsity coefficient is characterized by Gaussian-model, and the hyper-parameters of the prior model are excellently learned by expectation maximization(EM) algorithm. We utilize the k-nearest neighbor(k-NN) algorithm to learn the sparsity with the fact that the speech coefficients between adjacent frames are correlated. In addition, computational simulations are used to validate the proposed algorithm, which achieves better speech enhancement performance than other four baseline methods-Wiener filtering, subspace pursuit(SP), distributed sparsity adaptive matching pursuit(DSAMP), and expectation-maximization Gaussian-model approximate message passing(EM-GAMP) under different compression ratios and a wide range of signal to noise ratios(SNRs).
基金supported by the National Natural Science Foundation of Chi⁃na(61901417,U1804152,61801434)Science and Technology Re⁃search Project of Henan Province(212102210556,212102210566,212400410179).
文摘The orthogonal time frequency space(OTFS)modulation has emerged as a promis⁃ing modulation scheme for wireless communications in high-mobility scenarios.An efficient detector is of paramount importance to harvesting the time and frequency diversities promised by OTFS.Recently,some message passing based detectors have been developed by exploiting the features of the OTFS channel matrices.In this paper,we provide an overview of some re⁃cent message passing based OTFS detectors,compare their performance,and shed some light on potential research on the design of message passing based OTFS receivers.
基金supported by Guangdong Major Project of Basic and Applied Basic Research No.2020B0301030008Science and Technology Program of Guangzhou No.2019050001+2 种基金the Chinese Academy of Sciences Grant QYZDJ-SSWSYS018the National Natural Science Foundation of China(Grant No.12171479)supported by the National Natural Science Foundation of China(Grant Nos.11301339 and 11491240108)。
文摘Message passing algorithms,whose iterative nature captures complicated interactions among interconnected variables in complex systems and extracts information from the fixed point of iterated messages,provide a powerful toolkit in tackling hard computational tasks in optimization,inference,and learning problems.In the context of constraint satisfaction problems(CSPs),when a control parameter(such as constraint density)is tuned,multiple threshold phenomena emerge,signaling fundamental structural transitions in their solution space.Finding solutions around these transition points is exceedingly challenging for algorithm design,where message passing algorithms suffer from a large message fiuctuation far from convergence.Here we introduce a residual-based updating step into message passing algorithms,in which messages with large variation between consecutive steps are given high priority in the updating process.For the specific example of model RB(revised B),a typical prototype of random CSPs with growing domains,we show that our algorithm improves the convergence of message updating and increases the success probability in finding solutions around the satisfiability threshold with a low computational cost.Our approach to message passing algorithms should be of value for exploring their power in developing algorithms to find ground-state solutions and understand the detailed structure of solution space of hard optimization problems.
基金supported by the National Natural Science Foundation of China-China State Railway Group Co.,Ltd.Railway Basic Research Joint Fund (Grant No.U2268217)the Scientific Funding for China Academy of Railway Sciences Corporation Limited (No.2021YJ183).
文摘Graph Convolutional Neural Networks(GCNs)have been widely used in various fields due to their powerful capabilities in processing graph-structured data.However,GCNs encounter significant challenges when applied to scale-free graphs with power-law distributions,resulting in substantial distortions.Moreover,most of the existing GCN models are shallow structures,which restricts their ability to capture dependencies among distant nodes and more refined high-order node features in scale-free graphs with hierarchical structures.To more broadly and precisely apply GCNs to real-world graphs exhibiting scale-free or hierarchical structures and utilize multi-level aggregation of GCNs for capturing high-level information in local representations,we propose the Hyperbolic Deep Graph Convolutional Neural Network(HDGCNN),an end-to-end deep graph representation learning framework that can map scale-free graphs from Euclidean space to hyperbolic space.In HDGCNN,we define the fundamental operations of deep graph convolutional neural networks in hyperbolic space.Additionally,we introduce a hyperbolic feature transformation method based on identity mapping and a dense connection scheme based on a novel non-local message passing framework.In addition,we present a neighborhood aggregation method that combines initial structural featureswith hyperbolic attention coefficients.Through the above methods,HDGCNN effectively leverages both the structural features and node features of graph data,enabling enhanced exploration of non-local structural features and more refined node features in scale-free or hierarchical graphs.Experimental results demonstrate that HDGCNN achieves remarkable performance improvements over state-ofthe-art GCNs in node classification and link prediction tasks,even when utilizing low-dimensional embedding representations.Furthermore,when compared to shallow hyperbolic graph convolutional neural network models,HDGCNN exhibits notable advantages and performance enhancements.
基金supported by National Key Research and Development Program of China under Grants 2021YFB1600500,2021YFB3201502,and 2022YFB3207704Natural Science Foundation of China(NSFC)under Grants U2233216,62071044,61827901,62088101 and 62201056+1 种基金supported by Shandong Province Natural Science Foundation under Grant ZR2022YQ62supported by Beijing Nova Program,Beijing Institute of Technology Research Fund Program for Young Scholars under grant XSQD-202121009.
文摘The extra-large scale multiple-input multiple-output(XL-MIMO)for the beyond fifth/sixth generation mobile communications is a promising technology to provide Tbps data transmission and stable access service.However,the extremely large antenna array aperture arouses the channel near-field effect,resulting in the deteriorated data rate and other challenges in the practice communication systems.Meanwhile,multi-panel MIMO technology has attracted extensive attention due to its flexible configuration,low hardware cost,and wider coverage.By combining the XL-MIMO and multi-panel array structure,we construct multi-panel XL-MIMO and apply it to massive Internet of Things(IoT)access.First,we model the multi-panel XL-MIMO-based near-field channels for massive IoT access scenarios,where the electromagnetic waves corresponding to different panels have different angles of arrival/departure(AoAs/AoDs).Then,by exploiting the sparsity of the near-field massive IoT access channels,we formulate a compressed sensing based joint active user detection(AUD)and channel estimation(CE)problem which is solved by AMP-EM-MMV algorithm.The simulation results exhibit the superiority of the AMP-EM-MMV based joint AUD and CE scheme over the baseline algorithms.
基金supported in part by the National Key Research and Development Program of China(No.2021YFB2900600)in part by the National Natural Science Foundation of China under Grant 61971041 and Grant 62001027。
文摘Orthogonal Time Frequency Space(OTFS)signaling with index modulation(IM)is a promising transmission scheme characterized by high transmission efficiency for high mobility scenarios.In this paper,we study the receiver for coded OTFS-IM system.First,we construct the corresponding factor graph,on which the structured prior incorporating activation pattern constraint and channel coding is devised.Then we develop a iterative receiver via structured prior-based hybrid belief propagation(BP)and expectation propagation(EP)algorithm,named as StrBP-EP,for the coded OTFS-IM system.To reduce the computational complexity of discrete distribution introduced by structured prior,Gaussian approximation conducted by EP is adopted.To further reduce the complexity,we derive two variations of the proposed algorithm by using some approximations.Simulation results validate the superior performance of the proposed algorithm.
基金supported by the National Natural Science Foundation of China under Grant U19B2014the Sichuan Science and Technology Program under Grant 2023NSFSC0457the Fundamental Research Funds for the Central Universities under Grant 2242022k60006.
文摘In this paper,a powerful model-driven deep learning framework is exploited to overcome the challenge of multi-domain signal detection in spacedomain index modulation(SDIM)based multiple input multiple output(MIMO)systems.Specifically,we use orthogonal approximate message passing(OAMP)technique to develop OAMPNet,which is a novel signal recovery mechanism in the field of compressed sensing that effectively uses the sparse property from the training SDIM samples.For OAMPNet,the prior probability of the transmit signal has a significant impact on the obtainable performance.For this reason,in our design,we first derive the prior probability of transmitting signals on each antenna for SDIMMIMO systems,which is different from the conventional massive MIMO systems.Then,for massive MIMO scenarios,we propose two novel algorithms to avoid pre-storing all active antenna combinations,thus considerably improving the memory efficiency and reducing the related overhead.Our simulation results show that the proposed framework outperforms the conventional optimization-driven based detection algorithms and has strong robustness under different antenna scales.
基金supported by the Key Scientific Research Project in Colleges and Universities of Henan Province of China(Grant Nos.21A510003)Science and the Key Science and Technology Research Project of Henan Province of China(Grant Nos.222102210053).
文摘Orthogonal time frequency space(OTFS)technique,which modulates data symbols in the delay-Doppler(DD)domain,presents a potential solution for supporting reliable information transmission in highmobility vehicular networks.In this paper,we study the issues of DD channel estimation for OTFS in the presence of fractional Doppler.We first propose a channel estimation algorithm with both low complexity and high accuracy based on the unitary approximate message passing(UAMP),which exploits the structured sparsity of the effective DD domain channel using hidden Markov model(HMM).The empirical state evolution(SE)analysis is then leveraged to predict the performance of our proposed algorithm.To refine the hyperparameters in the proposed algorithm,we derive the update criterion for the hyperparameters through the expectation-maximization(EM)algorithm.Finally,Our simulation results demonstrate that our proposed algorithm can achieve a significant gain over various baseline schemes.
基金supported in part by the NSFC Project under Grant No.61871334part by the open research fund of the State Key Laboratory of Integrated Services Networks,Xidian University under Grant No.ISN21-15+1 种基金in part by the Fundamental Research Funds for the Central Universities,SWJTU under Grant No.2682020CX79supported by the NSFC project under Grant No.61731017 and the“111”project under Grant No.111-2-14.
文摘The newly emerging orthogonal time frequency space(OTFS)modulation can ob⁃tain delay-Doppler diversity gain to significantly improve the system performance in high mobility wireless communication scenarios such as vehicle-to-everything(V2X),high-speed railway and unmanned aerial vehicles(UAV),by employing inverse symplectic finite Fouri⁃er transform(ISFFT)and symplectic finite Fourier transform(SFFT).However,OTFS modu⁃lation will dramatically increase system complexity,especially at the receiver side.Thus,de⁃signing low complexity OTFS receiver is a key issue for OTFS modulation to be adopted by new-generation wireless communication systems.In this paper,we review low complexity OTFS detectors and provide some insights on future researches.We firstly present the OTFS system model and basic principles,followed by an overview of OTFS detector structures,classifications and comparative discussion.We also survey the principles of OTFS detection algorithms.Furthermore,we discuss the design of hybrid OTFS and orthogonal frequency di⁃vision multiplexing(OFDM)detectors in single user and multi-user multi-waveform commu⁃nication systems.Finally,we address the main challenges in designing low complexity OT⁃FS detectors and identify some future research directions.
基金Project supported by the National Natural Science Foundation of China(Grant No.61701057)the Chongqing Research Program of Basic Research and Frontier Technology,China(Grant No.cstc2017jcyjAX0345).
文摘We present a time domain hybrid method to realize the fast coupling analysis of transmission lines excited by space electromagnetic fields, in which parallel finite-difference time-domain (FDTD) method, interpolation scheme, and Agrawal model-based transmission line (TL) equations are organically integrated together. Specifically, the Agrawal model is employed to establish the TL equations to describe the coupling effects of space electromagnetic fields on transmission lines. Then, the excitation fields functioning as distribution sources in TL equations are calculated by the parallel FDTD method through using the message passing interface (MPI) library scheme and interpolation scheme. Finally, the TL equations are discretized by the central difference scheme of FDTD and assigned to multiple processors to obtain the transient responses on the terminal loads of these lines. The significant feature of the presented method is embodied in its parallel and synchronous calculations of the space electromagnetic fields and transient responses on the lines. Numerical simulations of ambient wave acting on multi-conductor transmission lines (MTLs), which are located on the PEC ground and in the shielded cavity respectively, are implemented to verify the accuracy and efficiency of the presented method.
基金supported by the National Basic Research Program of China (No. 2010CB832706)the State Key Laboratory of Explosion Science and Technology (No. ZDKT10-03b)
文摘Explosion and shock often involve large deformation, interface treatment between multi-material, and strong discontinuity. The Eulerian method has advantages for solving these problems. In parallel computation of the Eulerian method, the physical quantities of the computaional cells do not change before the disturbance reaches to these cells. Computational efficiency is low when using fixed partition because of load imbalance. To solve this problem, a dynamic parallel method in which the computation domain expands with disturbance is used. The dynamic parallel program is designed based on the generally used message passing interface model. The numerical test of dynamic parallel program agrees well with that of the original parallel program, also agrees with the actual situation.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11674139,11834005,and 11904145)the Program for Changjiang Scholars and Innovative Research Team in Universities,China(Grant No.IRT-16R35).
文摘We propose an improved real-space parallel strategy for the density matrix renormalization group(DMRG)method,where boundaries of separate regions are adaptively distributed during DMRG sweeps.Our scheme greatly improves the parallel efficiency with shorter waiting time between two adjacent tasks,compared with the original real-space parallel DMRG with fixed boundaries.We implement our new strategy based on the message passing interface(MPI),and dynamically control the number of kept states according to the truncation error in each DMRG step.We study the performance of the new parallel strategy by calculating the ground state of a spin-cluster chain and a quantum chemical Hamiltonian of the water molecule.The maximum parallel efficiencies for these two models are 91%and 76%in 4 nodes,which are much higher than the real-space parallel DMRG with fixed boundaries.
基金supported by the National Natural Science Foundation of China under Grants 61801523, 61971452, and 91538203
文摘Low-complexity detectors play an essential role in massive multiple-input multiple-output (MIMO) transmissions. In this work, we discuss the perspectives of utilizing approximate message passing (AMP) algorithm to the detection of massive MIMO transmission. To this end, we need to efficiently reduce the divergence occurrence in AMP iterations and bridge the performance gap that AMP has from the optimum detector while making use of its advantage of low computational load. Our solution is to build a neural network to learn and optimize AMP detection with four groups of specifically designed learnable coefficients such that divergence rate and detection mean squared error (MSE) can be significantly reduced. Moreover, the proposed deep learning-based AMP has a much faster converging rate, and thus a much lower computational complexity than conventional AMP, providing an alternative solution for the massive MIMO detection. Extensive simulation experiments are provided to validate the advantages of the proposed deep learning-based AMP.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金supported in part by the U.S.National Science Foundation under Grant Nos.1818253,1854828,1931537,and 2007991XRAC under Grant No.NCR-130002supported by the Office of Science of the U.S.Department of Energy under Contract No.DE-AC05-00OR22725.
文摘The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems.In particular,it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world,Frontier.It offers various features such as adaptive routing,congestion control,and isolated workloads.The deployment of newer interconnects sparks interest related to performance,scalability,and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems.In this paper,we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI(message passing interface)libraries.In particular,we look at the scalability performance when using Slingshot across nodes.We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH,Open-MPI+UCX,RCCL,and MVAPICH2 on CPUs and GPUs on the Spock system,an early access cluster deployed with Slingshot-10,AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system.We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.
基金supported by the National Natural Science Foundation of China(No.61901403)the Science and Technology Key Project of Fujian Province,China(Nos.2021HZ021004 and 2019HZ020009)+3 种基金the Open Research Fund of National Mobile Communications Research Laboratory,Southeast University(No.2023D10)the Youth Innovation Fund of Natural Science Foundation of Xiamen(No.3502Z20206039)the Science and Technology Key Project of Xiamen(No.3502Z20221027)the Xiamen Special Fund for Marine and Fishery Development(No.21CZB011HJ02).
文摘It has always been difficult to achieve accurate information of the channel for underwater acoustic communications because of the severe underwater propagation conditions,including frequency-selective property,high relative mobility,long propagation latency,and intensive ambient noise,etc.To this end,a deep unfolding neural network based approach is proposed,in which multiple layers of the network mimic the iterations of the classical iterative sparse approximation algorithm to extract the inherent sparse features of the channel by exploiting deep learning,and a scheme based on the Sparsity-Aware DNN(SA-DNN)for UAC estimation is proposed to improve the estimation accuracy.Moreover,we propose a Denoising Sparsity-Aware DNN(DeSA-DNN)based enhanced method that integrates a denoising CNN module in the sparsity-aware deep network,so that the degradation brought by intensive ambient noise could be eliminated and the estimation accuracy can be further improved.Simulation results demonstrate that the performance of the proposed schemes is superior to the state-of-the-art compressed sensing based and iterative sparse recovery schems in the aspects of channel recovery precision,pilot overhead,and robustness,particularly under unideal circumstances of intensive ambient noise or inadequate measurement pilots.
基金This work was mainly supported by the National High-Technology Research and Development Program(863)[grant number 2013AA122801]the National Science Foundation of the United States[Award No.1251095]+3 种基金Also it was partially supported by the Fundamental Research Funds for the Central Universities[grant number ZYGX2015J111]the project entitled‘Design and development of the parallelism for typical remote sensing image algorithm based on heterogeneous computing’from the Institute of Remote Sensing and Digital Earth,Chinese Academy of Sciencesthe project entitled‘CAST Innovation Fund:the Study of Agent and Cloud Based Spatial Big Data Service Chain’also the National Natural Science Foundation of China[grant number 51277167].
文摘MODerate resolution atmospheric TRANsmission(MODTRAN)is a commercial remote sensing(RS)software package that has been widely used to simulate radiative transfer of electromagnetic radiation through the Earth’s atmosphere and the radiation observed by a remote sensor.However,when very large RS datasets must be processed in simulation applications at a global scale,it is extremely time-consuming to operate MODTRAN on a modern workstation.Under this circumstance,the use of parallel cluster computing to speed up the process becomes vital to this time-consuming task.This paper presents PMODTRAN,an implementation of a parallel task-scheduling algorithm based on MODTRAN.PMODTRAN was able to reduce the processing time of the test cases used here from over 4.4 months on a workstation to less than a week on a local computer cluster.In addition,PMODTRAN can distribute tasks with different levels of granularity and has some extra features,such as dynamic load balancing and parameter checking.
基金supported by the decision support project of response to climate change of China,the National Natural Science Foundation of China (Nos.41674085, 41604009, and 41621091)the Natural Science Foundation of Qinghai Province (No. 2019-ZJ-7034)the Open Project of State Key Laboratory of Plateau Ecology and Agriculture,Qinghai University (No. 2020-zz-03)。
文摘A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.