The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has ...The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has a wide range of applications including question answering and semantic search.In this paper,we study the problem of subgraph matching on knowledge graph.Specifically,given a query graph q and a data graph G,the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G.Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph.To accelerate subgraph matching on knowledge graph,we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph,called as FGqT-Match.The subgraph matching algorithm consists of two key designs.One design is a subgraph index of matching-driven flow graph(FGqT),which reduces redundant calculations in advance.Another design is a multi-label weight matrix,which evaluates a near-optimal matching tree for minimizing the intermediate candidates.With the aid of these two key designs,all subgraph isomorphic mappings are quickly conducted only by traversing FGqj.Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.展开更多
Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given qu...Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.展开更多
Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age,location,education,interests,and...Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age,location,education,interests,and others.The task of matching user identities across different social networks is considered a challenging task.In this work,we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data,i.e,user-name and friendship.Thus,we propose a framework,ExpandUIL,that includes three standalone al-gorithms based on(i)the percolation graph matching in Ex-pand FullName algorithm,(i)a supervised machine learning algorithm that works with the graph embedding,and(ii)a combination of the two,ExpandUserLinkage algorithm.The proposed framework as a set of algorithms is significant as,(i)it is based on the network topology and requires only name feature of the nodes,(i)it requires a considerably low initial seed,as low as one initial seed suffices,(ii)it is iterative and scalable with applicability to online incoming stream graphs,and(iv)it has an experimental proof of stability over a real ground-truth dataset.Experiments on real datasets,Instagram and VK social networks,show upto 75%recall for linked ac-counts with 96%accuracy using only one given seed pair.展开更多
Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal...Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.展开更多
gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design,...gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.展开更多
基金the National Natural Science Foundation of China(Grant Nos.61976032,62002039).
文摘The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has a wide range of applications including question answering and semantic search.In this paper,we study the problem of subgraph matching on knowledge graph.Specifically,given a query graph q and a data graph G,the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G.Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph.To accelerate subgraph matching on knowledge graph,we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph,called as FGqT-Match.The subgraph matching algorithm consists of two key designs.One design is a subgraph index of matching-driven flow graph(FGqT),which reduces redundant calculations in advance.Another design is a multi-label weight matrix,which evaluates a near-optimal matching tree for minimizing the intermediate candidates.With the aid of these two key designs,all subgraph isomorphic mappings are quickly conducted only by traversing FGqj.Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.
文摘Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.
文摘Privacy preservation is a primary concern in social networks which employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age,location,education,interests,and others.The task of matching user identities across different social networks is considered a challenging task.In this work,we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data,i.e,user-name and friendship.Thus,we propose a framework,ExpandUIL,that includes three standalone al-gorithms based on(i)the percolation graph matching in Ex-pand FullName algorithm,(i)a supervised machine learning algorithm that works with the graph embedding,and(ii)a combination of the two,ExpandUserLinkage algorithm.The proposed framework as a set of algorithms is significant as,(i)it is based on the network topology and requires only name feature of the nodes,(i)it requires a considerably low initial seed,as low as one initial seed suffices,(ii)it is iterative and scalable with applicability to online incoming stream graphs,and(iv)it has an experimental proof of stability over a real ground-truth dataset.Experiments on real datasets,Instagram and VK social networks,show upto 75%recall for linked ac-counts with 96%accuracy using only one given seed pair.
基金This work was partially supported by the National Key Research and Development Program(No.2018YFB1800203)National Natural Science Foundation of China(No.U19B2024)Postgraduate Scientific Research Innovation Project of Hunan Province(No.CX20210038).
文摘Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.
文摘gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.