摘要
以往的维吾尔语事件抽取研究多采用静态词向量加长短时记忆神经网络的分析方式,无法有效处理一词多义和上下文语义表示问题。针对目标语言,训练两种维吾尔语预训练语言模型,提出一种结合BiGRU的联合问答模型事件抽取方法。利用预训练语言模型获取文本的动态语义向量,融合类别表征信息,运用BiGRU进一步提取文本特征。实验结果表明,在事件识别任务上F1值达到77.96%,在事件主体抽取任务上F1值达到74.89%。相比基线模型NER方法,所提方法的F1值提高了14.08%。
Previous studies on Uyghur event extraction mostly use static word vectors and short-term memory neural network analysis methods,which can not effectively deal with the problem of polysemy and contextual semantic representation of a word.For the target language,two Uyghur pre-training language models were trained,and a joint question answering model event extraction method combined with BiGRU was proposed.The pre-trained language model was used to obtain the dynamic semantic vector of the text,and the category representation information was merged,and BiGRU(bidirectional gated recurrent units)was used to further extract the text features.Experimental results show that the F1 value reaches 77.96%on the event recognition task,and 74.89%on the event subject extraction task.Compared with the baseline model NER method,the F1 value of the method can be increased by 14.08%.
作者
张朋捷
王磊
马博
杨雅婷
董瑞
艾孜麦提·艾瓦尼尔
ZHANG Peng-jie;WANG Lei;MA Bo;YANG Ya-ting;DONG Rui;Azmat·Anwar(Multilingual Information Technology Lab,The Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech and Language Information Processing,The Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China)
出处
《计算机工程与设计》
北大核心
2023年第5期1487-1494,共8页
Computer Engineering and Design
基金
中国科学院青年创新促进会基金项目(科发人函字[2019]26号)
国家自然科学基金项目(U2003303)
新疆天山创新团队基金项目(2020D14045)
国家重点研发计划基金项目(2017YFC0822505-4)。