The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP se...The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.展开更多
Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improv...Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.展开更多
Feedforward active noise control(ANC)system are widely used to reduce the wide-band noise in different application.In feedforward ANC systems,when the noise source is unknown,the misplacement of the reference micropho...Feedforward active noise control(ANC)system are widely used to reduce the wide-band noise in different application.In feedforward ANC systems,when the noise source is unknown,the misplacement of the reference microphone may violate the causality constraint.We present a performance analysis of the feedforward ANC system under a noncausal condition.The ANC system performance degrades when the degree of noncausality increases.This research applies the microphone array technique to feedforward ANC systems to solve the unknown noise source problem.The generalized cross-correlation(GCC)and steering response power(SRP)methods based on microphone array are used to estimate the noise source location.Then,the ANC system selects the proper reference microphone for a noise control algorithm.The simulation and experiment results show that the SRP method can estimate the noise source direction with 84%accuracy.The proposed microphone array integrated ANC system can dramatically improve the system performance.展开更多
Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transf...Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.展开更多
The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a ...The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.展开更多
基金Supported by the National Natural Science Foundation of China(No.61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1308)
文摘The Steered Response Power(SRP)method works well for sound source localization in noisy and reverberant environment.However,the large computation complexity limits its practical application.In this paper,a fast SRP search method is proposed to reduce the computational complexity using small-aperture microphone array.The proposed method inspired by the SRP spatial spectrum includes two steps:first,the proposed method estimates the azimuth of the sound source roughly and determines whether the sound source is in far field or near field;then,different fine searching operations are performed according to the sound source being in far field or near field.Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computation complexity of the proposed method with those of the conventional SRP-PHAT algorithm.The results show that,the proposed method has a comparative accuracy with the conventional SRP algorithm,and achieves a reduction of 93.62%in computation complexity compared to the conventional SRP algorithm.
基金This work is supported by Nanjing Institute of Technology(NIT)fund for Research Startup Projects of Introduced talents under Grant No.YKJ202019Nature Sci-ence Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018+1 种基金National Nature Science Foundation of China(NSFC)under Grant No.62001215NIT fund for Doctoral Research Projects under Grant No.ZKJ2020003.
文摘Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate.
文摘Feedforward active noise control(ANC)system are widely used to reduce the wide-band noise in different application.In feedforward ANC systems,when the noise source is unknown,the misplacement of the reference microphone may violate the causality constraint.We present a performance analysis of the feedforward ANC system under a noncausal condition.The ANC system performance degrades when the degree of noncausality increases.This research applies the microphone array technique to feedforward ANC systems to solve the unknown noise source problem.The generalized cross-correlation(GCC)and steering response power(SRP)methods based on microphone array are used to estimate the noise source location.Then,the ANC system selects the proper reference microphone for a noise control algorithm.The simulation and experiment results show that the SRP method can estimate the noise source direction with 84%accuracy.The proposed microphone array integrated ANC system can dramatically improve the system performance.
基金This work is supported by the National Nature Science Foundation of China(NSFC)under Grant No.61571106Jiangsu Natural Science Foundation under Grant No.BK20170757the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No.17KJD510002.
文摘Microphone array-based sound source localization(SSL)is a challenging task in adverse acoustic scenarios.To address this,a novel SSL algorithm based on deep neural network(DNN)using steered response power-phase transform(SRP-PHAT)spatial spectrum as input feature is presented in this paper.Since the SRP-PHAT spatial power spectrum contains spatial location information,it is adopted as the input feature for sound source localization.DNN is exploited to extract the efficient location information from SRP-PHAT spatial power spectrum due to its advantage on extracting high-level features.SRP-PHAT at each steering position within a frame is arranged into a vector,which is treated as DNN input.A DNN model which can map the SRP-PHAT spatial spectrum to the azimuth of sound source is learned from the training signals.The azimuth of sound source is estimated through trained DNN model from the testing signals.Experiment results demonstrate that the proposed algorithm significantly improves localization performance whether the training and testing condition setup are the same or not,and is more robust to noise and reverberation.
基金supported by the National Natural Science Foundation of China(Grant Nos. 60971098 and 61201345)the Beijing Key Laboratory of Advanced Information Science and Network Technology(Grant No.XDXX1308)
文摘The steered response power-phase transform (SRP-PHAT) sound source localization algorithm is robust in a real environment. However, the large computation complexity limits the practical application of SRP-PHAT. For a microphone array, each location corresponds to a set of time differences of arrival (TDOAs), and this paper collects them into a TDOA vector. Since the TDOA vectors in the adjacent regions are similar, we present a fast algorithm based on clustering search to reduce the computation complexity of SRP-PHAT. In the training stage, the K-means or Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm is used to find the centroid in each cluster with similar TDOA vectors. In the procedure of sound localization, the optimal cluster is found by comparing the steered response powers (SRPs) of all centroids. The SRPs of all candidate locations in the optimal cluster are compared to localize the sound source. Experiments both in simulation environments and real environments have been performed to compare the localization accuracy and computational load of the proposed method with those of the conventional SRP-PHAT algorithm. The results show that the proposed method is able to reduce the computational load drastically and maintains almost the same localization accuracy and robustness as those of the conventional SRP-PHAT algorithm. The difference in localization performance brought by different clustering algorithms used in the training stage is trivial.