摘要
生物医学文本语义消歧研究中,上下文语义表示存在精度不高、忽略语言特性等问题,对此提出一种基于Bi-LSTM的新型语言模型。该模型通过考虑上下文词序将整个句义信息以无监督学习方式嵌入低维连续空间,并以此生成高质量的上下文表示,然后利用该方法构建歧义向量,最终计算cosine相似度,完成对歧义词的分类。实验表明,相比传统线性语言模型,基于Bi-LSTM生成的语义向量能更好地表示歧义词的语义信息,并在不同生物医学文本数据集中达到高准确度(95.01/91.27)。
Aiming at the problem that the representation of context semantics in biomedical text semantic disambiguation has low preci. sion and neglected language characteristics,a new language model based on Bi-LSTM is proposed. The model embeds the entire sen. tence meaning information into the low-dimensional contiguous space by considering the context word order,and generates a high-quality context representation,and then uses the context representation method to construct the ambiguity vector,and finally cal. culates the cosine similarity to complete the classification. Classification of ambiguous words. Experiments show that compared with the traditional linear language model,the semantic vector generated by Bi-LSTM can better represent the semantic information of ambigu. ous words and achieve the highest accuracy in different biomedical text data sets(95.01/91.27).
作者
罗曜儒
李智
LUO Yao-ru;LI Zhi(Electrical Engineering Department,University of Sichuan,Chengdu 610065,China)
出处
《软件导刊》
2019年第4期57-59,63,共4页
Software Guide
关键词
语义消歧
Bi-LSTM
无监督学习
生物医学
上下文表示
word sense disambiguation
Bi-LSTM
unsupervised learning
biomedical domain
context representation