期刊文献+
共找到28篇文章
< 1 2 >
每页显示 20 50 100
Speech Intelligibility Enhancement Algorithm Based on Multi-Resolution Power-Normalized Cepstral Coefficients(MRPNCC)for Digital Hearing Aids
1
作者 Xia Wang Xing Deng +2 位作者 Hongming Shen Guodong Zhang Shibing Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第2期693-710,共18页
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro... Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired. 展开更多
关键词 Speech intelligibility enhancement multi-resolution power-normalized cepstral coefficients binary masking value hearing impaired
下载PDF
Modified Cepstral Feature for Speech Anti-spoofing
2
作者 何明瑞 ZAIDI Syed Faham Ali +3 位作者 田娩鑫 单志勇 江政儒 徐珑婷 《Journal of Donghua University(English Edition)》 CAS 2023年第2期193-201,共9页
The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identifica... The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack. 展开更多
关键词 spoofed speech detection log magnitude spectrum linear frequency cepstral coefficient(LFCC) hand-crafted feature
下载PDF
Comprehensive Analysis of Gender Classification Accuracy across Varied Geographic Regions through the Application of Deep Learning Algorithms to Speech Signals
3
作者 Abhishek Singhal Devendra Kumar Sharma 《Computer Systems Science & Engineering》 2024年第3期609-625,共17页
This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysi... This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model. 展开更多
关键词 Deep learning recurrent neural network voice signal mel frequency cepstral coefficients geographical area GENDER
下载PDF
Research on blind source separation of operation sounds of metro power transformer through an Adaptive Threshold REPET algorithm
4
作者 Liang Chen Liyi Xiong +2 位作者 Fang Zhao Yanfei Ju An Jin 《Railway Sciences》 2024年第5期609-621,共13页
Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored ... Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting. 展开更多
关键词 TRANSFORMER Voiceprint recognition Blind source separation Mel frequency cepstral coefficients(MFCC) Adaptive threshold
下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
5
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 Speech recognition automatic speech recognition(ASR) mel-frequency cepstral coefficients(MFCC) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
CNN-Based RF Fingerprinting Method for Securing Passive Keyless Entry and Start System
6
作者 Hyeon Park SeoYeon Kim +1 位作者 Seok Min Ko TaeGuen Kim 《Computers, Materials & Continua》 SCIE EI 2023年第8期1891-1909,共19页
The rapid growth of modern vehicles with advanced technologies requires strong security to ensure customer safety.One key system that needs protection is the passive key entry system(PKES).To prevent attacks aimed at ... The rapid growth of modern vehicles with advanced technologies requires strong security to ensure customer safety.One key system that needs protection is the passive key entry system(PKES).To prevent attacks aimed at defeating the PKES,we propose a novel radio frequency(RF)fingerprinting method.Our method extracts the cepstral coefficient feature as a fingerprint of a radio frequency signal.This feature is then analyzed using a convolutional neural network(CNN)for device identification.In evaluation,we conducted experiments to determine the effectiveness of different cepstral coefficient features and the convolutional neural network-based model.Our experimental results revealed that the Gammatone Frequency Cepstral Coefficient(GFCC)was the most compelling feature compared to Mel-Frequency Cepstral Coefficient(MFCC),Inverse Mel-Frequency Cepstral Coefficient(IMFCC),Linear-Frequency Cepstral Coefficient(LFCC),and Bark-Frequency Cepstral Coefficient(BFCC).Additionally,we experimented with evaluating the effectiveness of our method in comparison to existing approaches that are similar to ours. 展开更多
关键词 RF fingerprint cepstral coefficient convolutional neural network
下载PDF
Implementation of Hybrid Deep Reinforcement Learning Technique for Speech Signal Classification
7
作者 R.Gayathri K.Sheela Sobana Rani 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期43-56,共14页
Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional meth... Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional methods are considered inaccurate due to the uncertainty and diversity of speech signals in the case of real speech signal classification.In this paper,we use efficient speech signal classification using a series of neural network classifiers with reinforcement learning operations.Prior classification of speech signals,the study extracts the essential features from the speech signal using Cepstral Analysis.The features are extracted by converting the speech waveform to a parametric representation to obtain a relatively minimized data rate.Hence to improve the precision of classification,Generative Adversarial Networks are used and it tends to classify the speech signal after the extraction of features from the speech signal using the cepstral coefficient.The classifiers are trained with these features initially and the best classifier is chosen to perform the task of classification on new datasets.The validation of testing sets is evaluated using RL that provides feedback to Classifiers.Finally,at the user interface,the signals are played by decoding the signal after being retrieved from the classifier back based on the input query.The results are evaluated in the form of accuracy,recall,precision,f-measure,and error rate,where generative adversarial network attains an increased accuracy rate than other methods:Multi-Layer Perceptron,Recurrent Neural Networks,Deep belief Networks,and Convolutional Neural Networks. 展开更多
关键词 Neural network(NN) reinforcement learning(RL) cepstral coefficient speech signal classification
下载PDF
Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States
8
作者 Bronson Syiem Sushanta Kabir Dutta +1 位作者 Juwesh Binong Lairenlakpam Joyprakash Singh 《Journal of Electronic Science and Technology》 CAS CSCD 2021年第2期155-162,共8页
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic... In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features. 展开更多
关键词 Acoustic model(AM) Gaussian mixture model(GMM) hidden Markov model(HMM) language model(LM) linear predictive coding(LPC) linear prediction cepstral coefficient(LPCC) Mel frequency cepstral coefficient(MFCC) perceptual linear prediction(PLP)
下载PDF
基于余弦相似度的动态语音特征提取算法 被引量:11
9
作者 艾佳琪 左毅 +3 位作者 刘君霞 贺培超 李铁山 陈俊龙 《计算机应用研究》 CSCD 北大核心 2020年第S02期147-149,共3页
为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验... 为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验中,建立基于高斯混合模型(Gaussian mixture model,GMM)的说话人识别模型对C-vector进行识别精度和时间的讨论,并与经典的梅尔频率倒谱系数和等频域倒谱系数(histogram of DCT cepstrum coefficients,HDCC)进行对比实验。通过具体的实验结果比较,提出的C-vector在识别精度方面比MFCC和HDCC分别高出7%和5%。而且,C-vector在多人语音集下表现出的识别能力更为优异。 展开更多
关键词 说话人识别 语音特征 梅尔频率倒谱系数(Mel-frequency cepstral coefficients MFCC) 逆离散余弦变换倒谱系数(inrerse discrete cosine tromsform cepstrwm coefficient IDCT CC) 余弦相似度 层次聚类分析
下载PDF
Wake-Up-Word Feature Extraction on FPGA
10
作者 Veton ZKepuska Mohamed MEljhani Brian HHight 《World Journal of Engineering and Technology》 2014年第1期1-12,共12页
Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the... Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. 展开更多
关键词 Speech Recognition System Feature Extraction Mel-Frequency cepstral coefficients Linear Predictive Coding coefficients Enhanced Mel-Frequency cepstral coefficients Hidden Markov Models Field-Programmable Gate Arrays
下载PDF
An Efficient Approach for Segmentation, Feature Extraction and Classification of Audio Signals
11
作者 Muthumari Arumugam Mala Kaliappan 《Circuits and Systems》 2016年第4期255-279,共25页
Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still c... Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes. 展开更多
关键词 Audio Signal Enhanced Mel Frequency cepstral coefficient (EMFCC) Enhanced Power Normalized cepstral coefficients (EPNCC) Probabilistic Neural Network (PNN) Classifier
下载PDF
Integrated search technique for parameter determination of SVM for speech recognition 被引量:2
12
作者 Teena Mittal R.K.Sharma 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第6期1390-1398,共9页
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op... Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique. 展开更多
关键词 support vector machine (SVM) predator prey optimization speech recognition Mel-frequency cepstral coefficients wavelet packets Hooke-Jeeves method
下载PDF
Application of Hidden Markov Models in Speech Command Recognition 被引量:2
13
作者 Shing-Tai Pan Zong-Hong Huang +3 位作者 Sheng-Syun Yuan Xu-Yu Li Yu-De Su Jia-Hua Li 《Journal of Mechanics Engineering and Automation》 2020年第2期41-45,共5页
In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature val... In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature values.Subsequently,vector quantization and HMMs(hidden Markov models)were employed to achieve speech command recognition.The recorded speech length was three Chinese characters,which were used to test the method.Five phrases pronounced mixing various human voices were recorded and used to test the models.The recorded phrases were then used for speech command recognition to demonstrate whether the experiment results were satisfactory. 展开更多
关键词 HMMs Mel-frequency cepstral coefficients speech command recognition vector quantization
下载PDF
Environmental Sound Classification Using Deep Learning 被引量:7
14
作者 SHANTHAKUMAR S SHAKILA S +1 位作者 SUNETH Pathirana JAYALATH Ekanayake 《Instrumentation》 2020年第3期15-22,共8页
Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize soun... Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize sounds generated in the environment so that the impairment individuals can distinguish the sound categories.To that end first we define nine sound classes--air conditioner,car horn,children playing,dog bark,drilling,engine idling,jackhammer,siren,and street music--typically exist in the environment.Then we record 100 sound samples from each category and extract features of each sound category using Mel-Frequency Cepstral Coefficients(MFCC).The training dataset is developed using this set of features together with the class variable;sound category.Sound classification is a complex task and hence,we use two Deep Learning techniques;Multi Layer Perceptron(MLP)and Convolution Neural Network(CNN)to train classification models.The models are tested using a separate test set and the performances of the models are evaluated using precision,recall and F1-score.The results show that the CNN model outperforms the MLP.However,the MLP also provided a decent accuracy in classifying unknown environmental sounds. 展开更多
关键词 Mel-Frequency cepstral coefficients MFCC Multi-Layer Perceptron MLP Convolutional Neural Network CNN
原文传递
Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions 被引量:7
15
作者 Veton Z.Kepuska Hussien A.Elharati 《Journal of Computer and Communications》 2015年第6期1-9,共9页
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance... In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate. 展开更多
关键词 Speech Recognition Noisy Conditions Feature Extraction Mel-Frequency cepstral coefficients Linear Predictive Coding coefficients Perceptual Linear Production RASTA-PLP Isolated Speech Hidden Markov Model
下载PDF
Multi-Factor Authentication for Secured Financial Transactions in Cloud Environment
16
作者 D.Prabakaran Shyamala Ramachandran 《Computers, Materials & Continua》 SCIE EI 2022年第1期1781-1798,共18页
The rise of the digital economy and the comfort of accessing by way of user mobile devices expedite human endeavors in financial transactions over the Virtual Private Network(VPN)backbone.This prominent application of... The rise of the digital economy and the comfort of accessing by way of user mobile devices expedite human endeavors in financial transactions over the Virtual Private Network(VPN)backbone.This prominent application of VPN evades the hurdles involved in physical money exchange.The VPN acts as a gateway for the authorized user in accessing the banking server to provide mutual authentication between the user and the server.The security in the cloud authentication server remains vulnerable to the results of threat in JP Morgan Data breach in 2014,Capital One Data Breach in 2019,and manymore cloud server attacks over and over again.These attacks necessitate the demand for a strong framework for authentication to secure from any class of threat.This research paper,propose a framework with a base of EllipticalCurve Cryptography(ECC)to performsecure financial transactions throughVirtual PrivateNetwork(VPN)by implementing strongMulti-Factor Authentication(MFA)using authentication credentials and biometric identity.The research results prove that the proposed model is to be an ideal scheme for real-time implementation.The security analysis reports that the proposed model exhibits high level of security with a minimal response time of 12 s on an average of 1000 users. 展开更多
关键词 Cloud computing elliptical curve cryptography multi-factor authentication mel frequency cepstral coefficient privacy protection secured framework secure financial transactions
下载PDF
Application of formant instantaneous characteristics to speech recognition and speaker identification
17
作者 侯丽敏 胡晓宁 谢娟敏 《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页
This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant chara... This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant characteristics can be represented by instantaneous frequency (IF) and instantaneous bandwidth, namely formant instantaneous characteristics (FIC). In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients (MFCC) parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. 展开更多
关键词 instantaneous frequency (IF) Hilbert transform (HT) speech recognition speaker identification Mel-frequency cepstral coefficients (MFCC)
下载PDF
Extraction of novel features for emotion recognition
18
作者 李翔 郑宇 李昕 《Journal of Shanghai University(English Edition)》 CAS 2011年第5期479-486,共8页
Hilbert-Huang transform method has been widely utilized from its inception because of the superiority in varieties of areas. The Hilbert spectrum thus obtained is able to reflect the distribution of the signal energy ... Hilbert-Huang transform method has been widely utilized from its inception because of the superiority in varieties of areas. The Hilbert spectrum thus obtained is able to reflect the distribution of the signal energy in a number of scales accurately. In this paper, a novel feature called ECC is proposed via feature extraction of the Hilbert energy spectrum which describes the distribution of the instantaneous energy. The experimental results conspicuously demonstrate that ECC outperforms the traditional short-term average energy. Combination of the ECC with mel frequency cepstral coefficients (MFCC) delineates the distribution of energy in the time domain and frequency domain, and the features of this group achieve a better recognition effect compared with the feature combination of the short-term average energy, pitch and MFCC. Afterwards, further improvements of ECC are developed. TECC is gained by combining ECC with the teager energy operator, and EFCC is obtained by introducing the instantaneous frequency to the energy. In the experiments, seven status of emotion are selected to be recognized and the highest recognition rate 83.57% is achieved within the classification accuracy of boredom reaching 100%. The numerical results indicate that the proposed features ECC, TECC and EFCC can improve the performance of speech emotion recognition substantially. 展开更多
关键词 emotion recognition mel frequency cepstral coefficients (MFCC) feature extraction
下载PDF
A Novel System for Recognizing Recording Devices from Recorded Speech Signals
19
作者 Yongqiang Bao Qi Shao +4 位作者 Xuxu Zhang Jiahui Jiang Yue Xie Tingting Liu Weiye Xu 《Computers, Materials & Continua》 SCIE EI 2020年第12期2557-2570,共14页
The field of digital audio forensics aims to detect threats and fraud in audio signals.Contemporary audio forensic techniques use digital signal processing to detect the authenticity of recorded speech,recognize speak... The field of digital audio forensics aims to detect threats and fraud in audio signals.Contemporary audio forensic techniques use digital signal processing to detect the authenticity of recorded speech,recognize speakers,and recognize recording devices.User-generated audio recordings from mobile phones are very helpful in a number of forensic applications.This article proposed a novel method for recognizing recording devices based on recorded audio signals.First,a database of the features of various recording devices was constructed using 32 recording devices(20 mobile phones of different brands and 12 kinds of recording pens)in various environments.Second,the audio features of each recording device,such as the Mel-frequency cepstral coefficients(MFCC),were extracted from the audio signals and used as model inputs.Finally,support vector machines(SVM)with fractional Gaussian kernel were used to recognize the recording devices from their audio features.Experiments demonstrated that the proposed method had a 93.4%accuracy in recognizing recording devices. 展开更多
关键词 Recording device recognition Mel-frequency cepstral coefficients support vector machines
下载PDF
Autonomous Surveillance of Infants’ Needs Using CNN Model for Audio Cry Classification
20
作者 Geofrey Owino Anthony Waititu +1 位作者 Anthony Wanjoya John Okwiri 《Journal of Data Analysis and Information Processing》 2022年第4期198-219,共22页
Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ n... Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round. 展开更多
关键词 Convolutional Neural Network (CNN) Mel Frequency cepstral coefficients (MFCCs) Rectified Linear Unit (ReLU) Activation Function Audio Analytics Deep Neural Network (DNN)
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部