The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand t...The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand the text and collect accurate linguistic information because Chinese vocabulary is diverse and ambiguous.This paper mainly studies the candidate entity generation module of the entity link system.The candidate entity generation module constructs an entity reference expansion algorithm to improve the recall rate of candidate entities.In order to improve the efficiency of the connection algorithm of the entire system while ensuring the recall rate of candidate entities,we design a graph model filtering algorithm that fuses shallow semantic information to filter the list of candidate entities,and verify and analyze the efficiency of the algorithm through experiments.By analyzing the related technology of the entity linking algorithm,we study the related technology of candidate entity generation and entity disambiguation,improve the traditional entity linking algorithm,and give an innovative and practical entity linking model.The recall rate exceeds 82%,and the link accuracy rate exceeds 73%.Efficient and accurate entity linking can help machines to better understand text semantics,further promoting the development of NLP and improving the users’knowledge acquisition experience on the text.展开更多
Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ...Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.展开更多
With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chi...With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chinese tourism KGs do not contain enough entity information and relations.Besides,the knowledge storage usually contains only the text modality but lacks other modalities such as images.In this paper,a multi-modal Chinese tourism knowledge graph(MCTKG)is proposed based on Beijing tourist attractions to support QA and help tourists plan tourism routes.An MCTKG ontology was constructed to maintain the semantic consistency of heterogeneous data sources.To increase the number of entities and relations related to the tourist attractions in MCTKG,entities were automatically expanded belonging to the concepts of building,organization,relic,and person based on Baidu Encyclopedia.In addition,based on the types of tourist attractions and the styles of tourism route,a tourism route generation algorithm was proposed,which can automatically schedule the tourism routes by incorporating tourist attractions and the route style.Experimental results show that the generated tourist routes have similar satisfaction comparedwith the tourism routes crawled from specific travel websites.展开更多
基金supported by the Sichuan Science and Technology Program under Grant No.2021YFQ0009。
文摘The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand the text and collect accurate linguistic information because Chinese vocabulary is diverse and ambiguous.This paper mainly studies the candidate entity generation module of the entity link system.The candidate entity generation module constructs an entity reference expansion algorithm to improve the recall rate of candidate entities.In order to improve the efficiency of the connection algorithm of the entire system while ensuring the recall rate of candidate entities,we design a graph model filtering algorithm that fuses shallow semantic information to filter the list of candidate entities,and verify and analyze the efficiency of the algorithm through experiments.By analyzing the related technology of the entity linking algorithm,we study the related technology of candidate entity generation and entity disambiguation,improve the traditional entity linking algorithm,and give an innovative and practical entity linking model.The recall rate exceeds 82%,and the link accuracy rate exceeds 73%.Efficient and accurate entity linking can help machines to better understand text semantics,further promoting the development of NLP and improving the users’knowledge acquisition experience on the text.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61806020,61772082,61972047,61702296)the National Key Research and Development Program of China(2017YFB0803304)+1 种基金the Beijing Municipal Natural Science Foundation(4182043)the CCF-Tencent Open Fund,and the Fundamental Research Funds for the Central Universities.
文摘Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
基金This work is supported by the National Key Research and Development Program of China(2017YFB1002101)NSFC Key Project(U1736204)a grant from Beijing Academy of Artificial Intelligence(BAAI2019ZD0502).
文摘With the development of tourism knowledge graphs(KGs),recommendation,question answering(QA)and other functions under its support enable various applications to better understand users and provide services.Existing Chinese tourism KGs do not contain enough entity information and relations.Besides,the knowledge storage usually contains only the text modality but lacks other modalities such as images.In this paper,a multi-modal Chinese tourism knowledge graph(MCTKG)is proposed based on Beijing tourist attractions to support QA and help tourists plan tourism routes.An MCTKG ontology was constructed to maintain the semantic consistency of heterogeneous data sources.To increase the number of entities and relations related to the tourist attractions in MCTKG,entities were automatically expanded belonging to the concepts of building,organization,relic,and person based on Baidu Encyclopedia.In addition,based on the types of tourist attractions and the styles of tourism route,a tourism route generation algorithm was proposed,which can automatically schedule the tourism routes by incorporating tourist attractions and the route style.Experimental results show that the generated tourist routes have similar satisfaction comparedwith the tourism routes crawled from specific travel websites.