Age estimation in short speech utterances finds many applications in daily life like human-robot interaction,custom call routing,targeted marketing,user-profiling,etc.Despite the comprehensive studies carried out to e...Age estimation in short speech utterances finds many applications in daily life like human-robot interaction,custom call routing,targeted marketing,user-profiling,etc.Despite the comprehensive studies carried out to extract descriptive features,the estimation errors(i.e.years)are still high.In this study,an automatic system is proposed to estimate age in short speech utterances without depending on the text as well as the speaker.Firstly,four groups of features are extracted from each utterance frame using hybrid techniques and methods.After that,10 statistical functionals are measured for each extracted feature dimension.Then,the extracted feature dimensions are normalized and reduced using the Quantile method and the Linear Discriminant Analysis(LDA)method,respectively.Finally,the speaker’s age is estimated based on a multi-class classification approach by using the Extreme Gradient Boosting(XGBoost)classifier.Experiments have been carried out on the TIMIT dataset to measure the performance of the proposed system.The Mean Absolute Error(MAE)of the suggested system is 4.68 years,and 4.98 years,the Root Mean Square Error(RMSE)is 8.05 and 6.97,respectively,for female and male speakers.The results show a clear relative improvement in terms of MAE up to 28%and 10%for female and male speakers,respectively,in comparison to related works that utilized the TIMIT dataset.展开更多
文摘Age estimation in short speech utterances finds many applications in daily life like human-robot interaction,custom call routing,targeted marketing,user-profiling,etc.Despite the comprehensive studies carried out to extract descriptive features,the estimation errors(i.e.years)are still high.In this study,an automatic system is proposed to estimate age in short speech utterances without depending on the text as well as the speaker.Firstly,four groups of features are extracted from each utterance frame using hybrid techniques and methods.After that,10 statistical functionals are measured for each extracted feature dimension.Then,the extracted feature dimensions are normalized and reduced using the Quantile method and the Linear Discriminant Analysis(LDA)method,respectively.Finally,the speaker’s age is estimated based on a multi-class classification approach by using the Extreme Gradient Boosting(XGBoost)classifier.Experiments have been carried out on the TIMIT dataset to measure the performance of the proposed system.The Mean Absolute Error(MAE)of the suggested system is 4.68 years,and 4.98 years,the Root Mean Square Error(RMSE)is 8.05 and 6.97,respectively,for female and male speakers.The results show a clear relative improvement in terms of MAE up to 28%and 10%for female and male speakers,respectively,in comparison to related works that utilized the TIMIT dataset.