摘要
目前,音乐自动标注模型大多采用手动设计模式,因而存在最佳特征难以选择的问题。提出了一种基于非监督学习的特征学习算法,该算法能自动学习特征的潜在结构而不需要依赖先验知识。首先,预处理阶段主要提取音乐的音级轮廓频率谱并进行PCA白化降维处理;然后,采用深度学习中的降噪自动编码器算法对降维后的特征进行无监督的学习,并采用最大值池化和取均值来聚合得到新的特征向量;最后,将特征向量和标签送入多层感知机中进行有监督的学习。基于Magnatagatune和GTZAN数据库的实验结果表明,本文算法在一定程度上提高了音乐自动标注的准确率。
At present,the models used in music auto-tagging are mostly hand-engineered, so the choice of the optimal feature is always difficult. We propose an unsupervised feature learning algorithm,which can automatically learn the underlying structure of feature without prior know l e d g e. T h e algorithm is achieved in three stages. The preprocessing stage extracts the chroma-frequency spectrogram, and reduces the dimensionality via PCA whitening. T h e second stage applies the denoising autoencoder to the reduced feature in an unsupervised m a n n e r , and aggregates a n e w feature vector by max-pooling function and averaging. The last stage m a p s the feature vector to song labels by pre-trained multilayer perceptron (MLP ) in a supervised m a n n e r . T h e result based on the Magnatagatune and G T Z A N datasets s h o w s that our algorithm improves the accuracy of music auto-tagging to s o m e degree.
出处
《华东理工大学学报(自然科学版)》
CAS
CSCD
北大核心
2017年第2期241-247,共7页
Journal of East China University of Science and Technology
基金
国家自然科学基金(61271349)