摘要
光学乐谱识别是音乐信息检索中一项重要技术,音符识别是乐谱识别及其关键的部分.针对目前乐谱图像音符识别精度低、步骤冗杂等问题,设计了基于深度学习的端到端音符识别模型.该模型利用深度卷积神经网络,以整张乐谱图像为输入,直接输出音符的时值和音高.在数据预处理上,通过解析Music XML文件获得模型训练所需的乐谱图像和对应的标签数据,标签数据是由音符音高、音符时值和音符坐标组成的向量,因此模型通过训练来学习标签向量将音符识别任务转化为检测、分类任务.之后添加噪声、随机裁剪等数据增强方法来增加数据的多样性,使得训练出的模型更加鲁棒;在模型设计上,基于darknet53基础网络和特征融合技术,设计端到端的目标检测模型来识别音符.用深度神经网络darknet53提取乐谱图像特征图,让该特征图上的音符有足够大的感受野,之后将神经网络上层特征图和该特征图进行拼接,完成特征融合使得音符有更明显的特征纹理,从而让模型能够检测到音符这类小物体.该模型采用多任务学习,同时学习音高、时值的分类任务和音符坐标的回归任务,提高了模型的泛化能力.最后在Muse Score生成的测试集上对该模型进行测试,音符识别精度高,可以达到0.96的时值准确率和0.98的音高准确率.
Optical music recognition(OMR)is an important technology in music information retrieval.Note recognition is the key part of music score recognition.In view of the low accuracy of notes recognition and the cumbersome steps of the recognition of music score image,an end-to-end note recognition model based on deep learning is designed.The model uses the deep convolutional neural network to input the whole score image as the input,and directly outputs the duration and pitch of the note.In data preprocessing,the music image and the corresponding tag data required for model training were obtained by parsing the MusicXML file,the label data was a vector composed of note pitch,note duration and note coordinates,therefore,the model learned the label vector through training to transform the note recognition task into detection and classification tasks.Data enhancement methods such as noise and random cropping were added to increase the diversity of data,which made the trained model more robust.In the model design,based on the darknet53 basic network and feature fusion technology,an end-to-end target detection model was designed to recognize the notes.The deep neural network darknet53 was used to extract the feature image of the music image,so that the notes on the feature map had a large enough receptive field,and then the upper layer feature map of the neural network and the feature map were spliced,and the feature fusion is completed to make the note have more obvious feature and texture,allowing the model to detect small objects such as notes.The model adopted multi-task learning,and learned the pitch and duration classification task and note coordinates task,which improved the generalization ability of the model.Finally,the model was tested on the test set generated by MuseScore.The note recognition accuracy is high,and the duration accuracy of 0.96 and the pitch accuracy of 0.98 can be achieved.
作者
黄志清
贾翔
郭一帆
张菁
Huang Zhiqing;Jia Xiang;Guo Yifan;Zhang Jing(Faculty of Information Science,Beijing University of Technology,Beijing 100022,China)
出处
《天津大学学报(自然科学与工程技术版)》
EI
CSCD
北大核心
2020年第6期653-660,共8页
Journal of Tianjin University:Science and Technology
基金
北京市自然科学基金-市教委联合资助项目(KZ201910005007)。
关键词
光学乐谱识别
音符识别
深度学习
端到端
目标检测
optical music recognition
note recognition
deep learning
end-to-end
object detection