期刊文献+

基于深度学习的端到端乐谱音符识别 被引量:10

End-to-End Music Note Recognition Based on Deep Learning
下载PDF
导出
摘要 光学乐谱识别是音乐信息检索中一项重要技术,音符识别是乐谱识别及其关键的部分.针对目前乐谱图像音符识别精度低、步骤冗杂等问题,设计了基于深度学习的端到端音符识别模型.该模型利用深度卷积神经网络,以整张乐谱图像为输入,直接输出音符的时值和音高.在数据预处理上,通过解析Music XML文件获得模型训练所需的乐谱图像和对应的标签数据,标签数据是由音符音高、音符时值和音符坐标组成的向量,因此模型通过训练来学习标签向量将音符识别任务转化为检测、分类任务.之后添加噪声、随机裁剪等数据增强方法来增加数据的多样性,使得训练出的模型更加鲁棒;在模型设计上,基于darknet53基础网络和特征融合技术,设计端到端的目标检测模型来识别音符.用深度神经网络darknet53提取乐谱图像特征图,让该特征图上的音符有足够大的感受野,之后将神经网络上层特征图和该特征图进行拼接,完成特征融合使得音符有更明显的特征纹理,从而让模型能够检测到音符这类小物体.该模型采用多任务学习,同时学习音高、时值的分类任务和音符坐标的回归任务,提高了模型的泛化能力.最后在Muse Score生成的测试集上对该模型进行测试,音符识别精度高,可以达到0.96的时值准确率和0.98的音高准确率. Optical music recognition(OMR)is an important technology in music information retrieval.Note recognition is the key part of music score recognition.In view of the low accuracy of notes recognition and the cumbersome steps of the recognition of music score image,an end-to-end note recognition model based on deep learning is designed.The model uses the deep convolutional neural network to input the whole score image as the input,and directly outputs the duration and pitch of the note.In data preprocessing,the music image and the corresponding tag data required for model training were obtained by parsing the MusicXML file,the label data was a vector composed of note pitch,note duration and note coordinates,therefore,the model learned the label vector through training to transform the note recognition task into detection and classification tasks.Data enhancement methods such as noise and random cropping were added to increase the diversity of data,which made the trained model more robust.In the model design,based on the darknet53 basic network and feature fusion technology,an end-to-end target detection model was designed to recognize the notes.The deep neural network darknet53 was used to extract the feature image of the music image,so that the notes on the feature map had a large enough receptive field,and then the upper layer feature map of the neural network and the feature map were spliced,and the feature fusion is completed to make the note have more obvious feature and texture,allowing the model to detect small objects such as notes.The model adopted multi-task learning,and learned the pitch and duration classification task and note coordinates task,which improved the generalization ability of the model.Finally,the model was tested on the test set generated by MuseScore.The note recognition accuracy is high,and the duration accuracy of 0.96 and the pitch accuracy of 0.98 can be achieved.
作者 黄志清 贾翔 郭一帆 张菁 Huang Zhiqing;Jia Xiang;Guo Yifan;Zhang Jing(Faculty of Information Science,Beijing University of Technology,Beijing 100022,China)
出处 《天津大学学报(自然科学与工程技术版)》 EI CSCD 北大核心 2020年第6期653-660,共8页 Journal of Tianjin University:Science and Technology
基金 北京市自然科学基金-市教委联合资助项目(KZ201910005007)。
关键词 光学乐谱识别 音符识别 深度学习 端到端 目标检测 optical music recognition note recognition deep learning end-to-end object detection
  • 相关文献

参考文献1

二级参考文献6

  • 1刘晓翔 张树生.乐谱图像中谱线的检测与删除方法研究.中国图象图形学报:A辑,2003,8:657-661.
  • 2George S E.Visual Perception of Music Notation:On-line and Off-line Recognition[M].[S.l.]:IRM Press,2004.
  • 3Bainbridge D,Bell T C.A Music Notation Construction Engine for Optical Music Recognition[J].Software-Practice & Experience,2003,33(2):173-200.
  • 4Fahmy H.A Graph-rewriting Papadigm for Discrete Relaxation:Application to Sheet-music Recognition[J].International Journal of Pattern Recognition and Artificial Intelligence,1999,12(6):763-799.
  • 5Rossant F,Bloch I.Robust and Adaptive OMR System Including Fuzzy Modeling,Fusion of Musical Rules,and Possible Error Detection[J].EURASIP Journal on Applied Signal Processing,2007,(1):815-841.
  • 6刘晓翔,张树生,王静,汪鹏.乐谱图像倾角快速检测方法[J].计算机工程,2004,30(2):33-35. 被引量:3

共引文献1

同被引文献71

引证文献10

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部