A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have...A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.展开更多
在文旅领域智能问答中,用户问句文本表征稀疏、口语化表达、一词多义及特定领域词汇的识别困难使得常见的匹配模型难以将用户问句与标准问句进行精准匹配。针对此问题,本文构建了文旅客服问句匹配数据集和相应的领域词典,在此基础上提...在文旅领域智能问答中,用户问句文本表征稀疏、口语化表达、一词多义及特定领域词汇的识别困难使得常见的匹配模型难以将用户问句与标准问句进行精准匹配。针对此问题,本文构建了文旅客服问句匹配数据集和相应的领域词典,在此基础上提出一种融合领域词典的文旅问句匹配模型SBIDD(Improved SBERT Model for Integrating Domain Dictionaries)。模型利用Sentence-BERT对问句进行向量化表示,在孪生网络模型中融入领域词典,增强问句的领域词权重,使得模型对领域词汇的识别能力大幅提升。在自建数据集和公开数据集ATEC 2018 NLP上分别进行实验。结果表明,构建的模型与5种经典文本匹配模型DSSM、BiMPM、ESIM、IMAF、TSFR-RM及基线模型SBERT相比效果更优,F1值达到95.65%,比基线模型提升了2.75%,且模型在检索任务上表现出更高的适配性和鲁棒性。展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
基金This work is funded by Guangdong Basic and Applied Basic Research Foundation(No.2021A1515012307,2020A1515010450)Guangzhou Basic and Applied Basic Research Foundation(No.202102021207,202102020867)+4 种基金the National Natural Science Foundation of China(No.62072130,61702220,61702223)Guangdong Province Key Area R&D Program of China(No.2019B010136003,2019B010137004)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2019)Guangdong Higher Education Innovation Group(No.2020KCXTD007)Guangzhou Higher Education Innovation Group(No.202032854)。
文摘A key aspect of Knowledge fusion is Entity Matching.The objective of this study was to investigate how to identify heterogeneous expressions of the same real-world entity.In recent years,some representative works have used deep learning methods for entity matching,and these methods have achieved good results.However,the common limitation of these methods is that they assume that different attribute columns of the same entity are independent,and inputting the model in the form of paired entity records will cause repeated calculations.In fact,there are often potential relations between different attribute columns of different entities.These relations can help us improve the effect of entity matching,and can perform feature extraction on a single entity record to avoid repeated calculations.To use attribute relations to assist entity matching,this paper proposes the Relation-aware Entity Matching method,which embeds attribute relations into the original entity description to form sentences,so that entity matching is transformed into a sentence-level similarity determination task,based on Sentence-BERT completes sentence similarity calculation.We have conducted experiments on structured,dirty,and textual data,and compared them with baselines in recent years.Experimental results show that the use of relational embedding is helpful for entity matching on structured and dirty data.Our method has good results on most data sets for entity matching and reduces repeated calculations.
文摘在文旅领域智能问答中,用户问句文本表征稀疏、口语化表达、一词多义及特定领域词汇的识别困难使得常见的匹配模型难以将用户问句与标准问句进行精准匹配。针对此问题,本文构建了文旅客服问句匹配数据集和相应的领域词典,在此基础上提出一种融合领域词典的文旅问句匹配模型SBIDD(Improved SBERT Model for Integrating Domain Dictionaries)。模型利用Sentence-BERT对问句进行向量化表示,在孪生网络模型中融入领域词典,增强问句的领域词权重,使得模型对领域词汇的识别能力大幅提升。在自建数据集和公开数据集ATEC 2018 NLP上分别进行实验。结果表明,构建的模型与5种经典文本匹配模型DSSM、BiMPM、ESIM、IMAF、TSFR-RM及基线模型SBERT相比效果更优,F1值达到95.65%,比基线模型提升了2.75%,且模型在检索任务上表现出更高的适配性和鲁棒性。
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.