期刊文献+

基于高斯分布和汉字组件特征的中文词表示学习

Chinese word representation learning based on Gaussian distribution and Chinese character component characteristics
原文传递
导出
摘要 使用一种基于密度的分布式嵌入式表示,并给出一种学习高斯分布空间表示的方法,以更好地捕获关于表示及其关系的不确定性,比点积余弦相似度更自然地表达词语的不对称性;同时,针对中文汉字本身特点,将组成汉字的组件即子汉字的语义信息加入词表示训练。与现有方法对比,该文的模型性能在词语相似度或下游任务等方面有更好的效果,且能更好地表达词语的不确定性。 We use a distributed embedded representation based on density,and give a method to learn the space representation of the Gaussian distribution,so as to better capture the uncertainty about the representation and its relationship,to express the asymmetry of the words more naturally than the dot product cosine similarity.At the same time,according to the characteristics of Chinese characters,the semantic information of the Chinese characters?components is added to the word embedding training.Compared with existing methods,our model has better performance in terms of word similarity or downstream tasks,and can express the uncertainty of words.
作者 易洁 钟茂生 刘根 王明文 YI Jie;ZHONG Mao-sheng;LIU Gen;WANG Ming-wen(Computer and Information Engineering College,Jiangxi Normal University,Nanchang 330022,Jiangxi,China)
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2021年第5期85-91,共7页 Journal of Shandong University(Natural Science)
基金 国家自然科学基金资助项目(61877031,61876074)。
关键词 词表示学习 高斯分布 汉字组件 语义不确定性 word representation learning Gaussian distribution Chinese characters?components semantic uncertainty
  • 相关文献

参考文献1

二级参考文献13

  • 1Philip Resnik. Using information content to evaluate semantic simi- larity in a taxonomy [A]. In: C. Raymond Perrault, Chris S. Mellish, Renato deMori eds. Proceedings of the 14th International Joint Conference on Artificial InteUigence [ C]. Montreal: AAAI Press, 1995:448-453.
  • 2George A Miller. WordNet: a lexical database for english [ C].Communications of the ACM, 1995:38( 11 ) :39-41.
  • 3Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi. WordNet: similarity: measuring the relatedness of concepts [ C ]. In: David Palmer, Joseph Polifroni, Deb Roy, eds. Proc. of Human Lan- guage Tectmology conference. Montteal: Association for Computa- tional Linguistics, 2004:38-41.
  • 4Li Yun. Mining semantic knowledge from chinese Wikipedia [D]. Beijing University of Posts and Telecommunications,2009.
  • 5Evgeniy Gabrilovich, Shaul Markovitch. Computing semantic relat edness using Wikipedia-based explicit semantic analysis [ A]. InI Manuela Veloso. Proceedings of the 20th International Joint Confe1 ence on Artificial Intelligence [ C ]. Hyderabad: AAAI Press 2007 : 1606-1611.
  • 6David Milne, Ian H Witten. An effective, low-cost measure of se- mantic relatedness obtained from Wikipedia links [ A]. In: Taylor Matthew, Dfiessens Kurt, Fern Alan eds. Proc. of the 23th Associ- ation for the Advancement of Artificial Intelligence [ C ]. Chicago: AAAI Press,2008:25-30.
  • 7Thomas K Landauer, Peter W Foltz, Darrell Laham. An introduc- tion to latent semantic analysis [ J]. Discourse Processes, 1998,25 (2-3) :259-284.
  • 8Liu Qun,Li Su-jian. Word slmHarlty computing based on how-net [ J]. International Journal of Computational Linguistics & Chinese Language Processing,2002,7 (2) :59-76.
  • 9Michael S~rube, Shnone Paolo Ponzetto. WfidRelate computing se- mantic relatedness using Wikipedia [ A]. In: Anthony Colin, Uni-versity of Leeds, eds. Proceedings of the 21th American Associa- tion for Artificial Intelligence [ C ]. Boston: AAAI Press, 2006: 1419-t424.
  • 10Jay J Jiang, David W Conrath. Semantic s'nnilarity based on corpus statistics and lexical taxonomy [ C]. In Proceedings of Internation- al Conference Research on Computational Linguistics, Taiwan, 1997 : 1-15.

共引文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部