To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are cla...To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.展开更多
基金The State Grid Technology Project(No.5108202340042A-1-1-ZN).
文摘To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.