摘要
使用一种基于密度的分布式嵌入式表示,并给出一种学习高斯分布空间表示的方法,以更好地捕获关于表示及其关系的不确定性,比点积余弦相似度更自然地表达词语的不对称性;同时,针对中文汉字本身特点,将组成汉字的组件即子汉字的语义信息加入词表示训练。与现有方法对比,该文的模型性能在词语相似度或下游任务等方面有更好的效果,且能更好地表达词语的不确定性。
We use a distributed embedded representation based on density,and give a method to learn the space representation of the Gaussian distribution,so as to better capture the uncertainty about the representation and its relationship,to express the asymmetry of the words more naturally than the dot product cosine similarity.At the same time,according to the characteristics of Chinese characters,the semantic information of the Chinese characters?components is added to the word embedding training.Compared with existing methods,our model has better performance in terms of word similarity or downstream tasks,and can express the uncertainty of words.
作者
易洁
钟茂生
刘根
王明文
YI Jie;ZHONG Mao-sheng;LIU Gen;WANG Ming-wen(Computer and Information Engineering College,Jiangxi Normal University,Nanchang 330022,Jiangxi,China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2021年第5期85-91,共7页
Journal of Shandong University(Natural Science)
基金
国家自然科学基金资助项目(61877031,61876074)。
关键词
词表示学习
高斯分布
汉字组件
语义不确定性
word representation learning
Gaussian distribution
Chinese characters?components
semantic uncertainty