The Tor dark web network has been reported to provide a breeding ground for criminals and fraudsters who are exploiting the vulnerabilities in the network to carry out illicit and unethical activities.The network has ...The Tor dark web network has been reported to provide a breeding ground for criminals and fraudsters who are exploiting the vulnerabilities in the network to carry out illicit and unethical activities.The network has unfortunately become a means to perpetuate crimes like illegal drugs and firearm trafficking,violence and terrorist activities among others.The government and law enforcement agencies are working relentlessly to control the misuse of Tor network.This is a study in the similar league,with an attempt to suggest a link-based ranking technique to rank and identify the influential hidden services in the Tor dark web.The proposed method considers the extent of connectivity to the surface web services and values of the centrality metrics of a hidden service in the web graph for ranking.The modified PageRank algorithm is used to obtain the overall rankings of the hidden services in the dataset.Several graph metrics were used to evaluate the effectiveness of the proposed technique with other commonly known ranking procedures in literature.The proposed ranking technique is shown to produce good results in identifying the influential domains in the tor network.展开更多
Effective link analysis techniques are needed to help law enforcement and intelligence agencies fight money laundering. This paper presents a link analysis technique that uses a modified shortest-path algorithms to id...Effective link analysis techniques are needed to help law enforcement and intelligence agencies fight money laundering. This paper presents a link analysis technique that uses a modified shortest-path algorithms to identify the strongest association paths between entities in a money laundering network. Based on two-tree Dijkstra and Priority'First-Search (PFS) algorithm, a modified algorithm is presented. To apply the algorithm, a network representation transformation is made first.展开更多
Select link analysis provides information of where traffic comes from and goes to at selected links.This disaggregate information has wide applications in practice.The state-of-the-art planning software packages often...Select link analysis provides information of where traffic comes from and goes to at selected links.This disaggregate information has wide applications in practice.The state-of-the-art planning software packages often adopt the user equilibrium(UE) model for select link analysis.However,empirical studies have repeatedly revealed that the stochastic user equilibrium model more accurately predicts observed mean and variance of choices than the UE model.This paper proposes an alternative select link analysis method by making use of the recently developed logit–weibit hybrid model,to alleviate the drawbacks of both logit and weibit models while keeping a closed-form route choice probability expression.To enhance the applicability in large-scale networks,Bell’s stochastic loading method originally developed for logit model is adapted to the hybrid model.The features of the proposed method are twofold:(1) unique O–D-specific link flow pattern and more plausible behavioral realism attributed to the hybrid route choice model and(2) applicability in large-scale networks due to the link-based stochastic loading method.An illustrative network example and a case study in a large-scale network are conducted to demonstrate the efficiency and effectiveness of the proposed select link analysis method as well as applications of O–D-specific link flow information.A visualizationmethod is also proposed to enhance the understanding of O–D-specific link flow originally in the form of a matrix.展开更多
A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age a...A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.展开更多
The FuTURE 4G Time Division Duplex (TDD) trial system uses 3.5 GHz carrier frequency and several crucial technologies including broadband Multiple Input Multiple Output (MIMO) and Orthogonal Frequency Division Multipl...The FuTURE 4G Time Division Duplex (TDD) trial system uses 3.5 GHz carrier frequency and several crucial technologies including broadband Multiple Input Multiple Output (MIMO) and Orthogonal Frequency Division Multiplexing (OFDM). These technologies challenge the link budget and networking analysis of the FuTURE 4G TDD trial network. This paper analyzes the practical 3.5 GHz propagation model and the link budget of Radio Frequency (RF) parameters of the trial system. Moreover,it introduces networking analysis and network planning of the trial system,which combines the field test results of the MIMO system. The FuTURE 4G TDD trial system and its trial network have been accomplished with successful checkup. The trial system fulfills all the requirements with two Access Points (AP) and three Mobile Terminals (MT),which supports multi-user,mobility,a high peak rate of 100 Mb/s,High-Definition TV (HDTV),high-speed data download,and Voice over IP (VoIP) services.展开更多
This paper focuses on some key problems in web community discovery and link analysis.Based on the topic-oriented technology,the characteristics of a bipartite graph are studied.An Х bipartite core set is introduced t...This paper focuses on some key problems in web community discovery and link analysis.Based on the topic-oriented technology,the characteristics of a bipartite graph are studied.An Х bipartite core set is introduced to more clearly define extracting ways.By scanning the topic subgraph to construct Х bipartite graph and then prune the graph with i and j ,an Х bipartite core set,which is also the minimum element of a community,can be found.Finally,a hierarchical clustering algorithm is applied to many Х bipartite core sets and the dendrogram of the community inner construction is obtained.The correctness of the constructing and pruning method is proved and the algorithm is designed.The typical datasets in the experiment are prepared according to the way in HITS(hyperlink-induced topic search).Ten topics and four search engines are chosen and the returned results are integrated.The modularity,which is a measure of the strength of the community structure in the social network,is used to validate the efficiency of the proposed method.The experimental results show that the proposed algorithm is effective and efficient.展开更多
How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we ...How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.展开更多
In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite G...In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite Graph). Based on DBG, we proposed a new method based on edge removal to extract cores from a web graph. Moreover, we improve the crawler to save only potential pages as fans of a core and save a lot of disk storage space. To evaluate the set of cores whether or not belong to a community, the statistics of term frequency is used. In the paper, the dataset of experiment were crawled under domain ".cn". The result show that the our algorithm works properly and some new cores can be found by our method.展开更多
Ensuring a minimum operational level of road networks in the presence of unexpected incidents is becoming a hot subject in academic circles as well as industry. To this end, it is important to understand the degree to...Ensuring a minimum operational level of road networks in the presence of unexpected incidents is becoming a hot subject in academic circles as well as industry. To this end, it is important to understand the degree to which each single element of the network contributes to the operation and performance of a network. In other words, a road can become an "Achilles-heel" for the entire network if it is closed due to a simple incident. Such insight of the detrimental loss of the closure of the roads would help us to be more vigilant and prepared. In this study, we develop an index dubbed as Achilles-heel index to quantify detrimental loss of the closure of the respective roads. More precisely, the Achilles-heel index indicates how many drivers are affected by the closure of the respective roads (the number of affected drivers is also called travel demand coverage). To this end,roads with maximum travel demand coverage are sorted as the most critical ones, for which a method known as "link analysis"--is adopted. In an iterative process, first, a road with highest traffic volume is first labeled as "target link," and second, a portion of travel demand which is captured by the target link is excluded from travel demand. For the next iteration, the trimmed travel demand is then assigned to the network where all links including the target links run on the initial travel times. The process carries on until all links are labeled. The proposed methodology is applied to a large- sized network of Winnipeg, Canada. The results shed light on also bottleneck points of the network which may warrant provision of additional capacity or parallel roads.展开更多
Popularity of blogs and the amount of information in the blogosphere increase so fast that it is difficult for Internet users to search the information they care about. Compared with conventional webs,links in the blo...Popularity of blogs and the amount of information in the blogosphere increase so fast that it is difficult for Internet users to search the information they care about. Compared with conventional webs,links in the blogosphere are more abundant and conversations between bloggers are more fre-quent. This paper proposes a method of ranking bloggers based on link analysis,which can exemplify the characteristics of blogs,and reduce the influence of link spamming. This method can also bring convenience to users to read blogs,and it can supply a new methodology for information retrieval in the blogosphere. To ensure the reliability of the ranking results,some evaluation indicators of the im-portant bloggers are proposed,and the grading results of bloggers using the proposed method is compared with that using other indicators. At last,correlation analysis is used to verify the consistency between the proposed method and the evaluation indicators.展开更多
Broadband satellite communications can enable a plethora of applications in customer services, global nomadic coverage and disaster prediction and recovery. Terahertz(THz) band is envisioned as a key satellite communi...Broadband satellite communications can enable a plethora of applications in customer services, global nomadic coverage and disaster prediction and recovery. Terahertz(THz) band is envisioned as a key satellite communication technology due to its very broad bandwidth, astrophysical observation advantages and device maturing in recent years. In this paper, a massive-antenna-array-enabled THz satellite communication system is proposed to be established in Tanggula, Tibet, where the average altitude is 5.068 km and the mean-clear-sky precipitable water vapor(PWV) is as low as 1.31 mm. In particular, a link budget analysis(LBA) framework is developed for THz space communications, considering unique THz channel properties and massive antenna array techniques. Moreover, practical siting conditions are taken into account, including the altitude, PWV, THz spectral windows, rain and cloud factors. On the basis of the developed link budget model, the massive antenna array model, and the practical parameters in Tanggula, the performances of signal-to-noise ratio(SNR) and capacity are evaluated. The results illustrate that 1 Tbit/s is attainable in the 0.275~0.37 THz spectral window in Tanggula, by using an antenna array of the size 64.展开更多
Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-o...Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.展开更多
基金supported by Taif University Researchers Supporting Project Number(TURSP-2020/231),Taif University,Taif,Saudi Arabia.
文摘The Tor dark web network has been reported to provide a breeding ground for criminals and fraudsters who are exploiting the vulnerabilities in the network to carry out illicit and unethical activities.The network has unfortunately become a means to perpetuate crimes like illegal drugs and firearm trafficking,violence and terrorist activities among others.The government and law enforcement agencies are working relentlessly to control the misuse of Tor network.This is a study in the similar league,with an attempt to suggest a link-based ranking technique to rank and identify the influential hidden services in the Tor dark web.The proposed method considers the extent of connectivity to the surface web services and values of the centrality metrics of a hidden service in the web graph for ranking.The modified PageRank algorithm is used to obtain the overall rankings of the hidden services in the dataset.Several graph metrics were used to evaluate the effectiveness of the proposed technique with other commonly known ranking procedures in literature.The proposed ranking technique is shown to produce good results in identifying the influential domains in the tor network.
基金Supported bythe National Tenth Five-Year PlanforScientific and Technological Development of China (2001BA102A06-11)
文摘Effective link analysis techniques are needed to help law enforcement and intelligence agencies fight money laundering. This paper presents a link analysis technique that uses a modified shortest-path algorithms to identify the strongest association paths between entities in a money laundering network. Based on two-tree Dijkstra and Priority'First-Search (PFS) algorithm, a modified algorithm is presented. To apply the algorithm, a network representation transformation is made first.
基金supported by National Natural Science Foundation of China(51408433)Fundamental Research Funds for the Central Universities of Chinathe Chenguang Program sponsored by Shanghai Education Development Foundation and Shanghai Municipal Education Commission
文摘Select link analysis provides information of where traffic comes from and goes to at selected links.This disaggregate information has wide applications in practice.The state-of-the-art planning software packages often adopt the user equilibrium(UE) model for select link analysis.However,empirical studies have repeatedly revealed that the stochastic user equilibrium model more accurately predicts observed mean and variance of choices than the UE model.This paper proposes an alternative select link analysis method by making use of the recently developed logit–weibit hybrid model,to alleviate the drawbacks of both logit and weibit models while keeping a closed-form route choice probability expression.To enhance the applicability in large-scale networks,Bell’s stochastic loading method originally developed for logit model is adapted to the hybrid model.The features of the proposed method are twofold:(1) unique O–D-specific link flow pattern and more plausible behavioral realism attributed to the hybrid route choice model and(2) applicability in large-scale networks due to the link-based stochastic loading method.An illustrative network example and a case study in a large-scale network are conducted to demonstrate the efficiency and effectiveness of the proposed select link analysis method as well as applications of O–D-specific link flow information.A visualizationmethod is also proposed to enhance the understanding of O–D-specific link flow originally in the form of a matrix.
基金This work is funded by Newton Institutional Links 2020-21 project:623718881,jointly by British Council and National Research Council of Thailand(www.british council.org).The first author is the project PI with the other participating as a Co-I.
文摘A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.
基金the National Natural Science Foundation of China under Grant 60496312the 863 Program of China under Grants 2003AA12331004 and 2006AA01Z260.
文摘The FuTURE 4G Time Division Duplex (TDD) trial system uses 3.5 GHz carrier frequency and several crucial technologies including broadband Multiple Input Multiple Output (MIMO) and Orthogonal Frequency Division Multiplexing (OFDM). These technologies challenge the link budget and networking analysis of the FuTURE 4G TDD trial network. This paper analyzes the practical 3.5 GHz propagation model and the link budget of Radio Frequency (RF) parameters of the trial system. Moreover,it introduces networking analysis and network planning of the trial system,which combines the field test results of the MIMO system. The FuTURE 4G TDD trial system and its trial network have been accomplished with successful checkup. The trial system fulfills all the requirements with two Access Points (AP) and three Mobile Terminals (MT),which supports multi-user,mobility,a high peak rate of 100 Mb/s,High-Definition TV (HDTV),high-speed data download,and Voice over IP (VoIP) services.
基金The National Natural Science Foundation of China(No.60773216)the National High Technology Research and Development Program of China(863Program)(No.2006AA010109)+1 种基金the Natural Science Foundation of Renmin University of China(No.06XNB052)Free Exploration Project(985 Project of Renmin University of China)(No.21361231)
文摘This paper focuses on some key problems in web community discovery and link analysis.Based on the topic-oriented technology,the characteristics of a bipartite graph are studied.An Х bipartite core set is introduced to more clearly define extracting ways.By scanning the topic subgraph to construct Х bipartite graph and then prune the graph with i and j ,an Х bipartite core set,which is also the minimum element of a community,can be found.Finally,a hierarchical clustering algorithm is applied to many Х bipartite core sets and the dendrogram of the community inner construction is obtained.The correctness of the constructing and pruning method is proved and the algorithm is designed.The typical datasets in the experiment are prepared according to the way in HITS(hyperlink-induced topic search).Ten topics and four search engines are chosen and the returned results are integrated.The modularity,which is a measure of the strength of the community structure in the social network,is used to validate the efficiency of the proposed method.The experimental results show that the proposed algorithm is effective and efficient.
基金Supported bythe 211 Project of Ministry of Educa-tion of China
文摘How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.
基金Supported by the Natural Science Fund of Renmin Uni-versity of China (30207108)
文摘In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite Graph). Based on DBG, we proposed a new method based on edge removal to extract cores from a web graph. Moreover, we improve the crawler to save only potential pages as fans of a core and save a lot of disk storage space. To evaluate the set of cores whether or not belong to a community, the statistics of term frequency is used. In the paper, the dataset of experiment were crawled under domain ".cn". The result show that the our algorithm works properly and some new cores can be found by our method.
文摘Ensuring a minimum operational level of road networks in the presence of unexpected incidents is becoming a hot subject in academic circles as well as industry. To this end, it is important to understand the degree to which each single element of the network contributes to the operation and performance of a network. In other words, a road can become an "Achilles-heel" for the entire network if it is closed due to a simple incident. Such insight of the detrimental loss of the closure of the roads would help us to be more vigilant and prepared. In this study, we develop an index dubbed as Achilles-heel index to quantify detrimental loss of the closure of the respective roads. More precisely, the Achilles-heel index indicates how many drivers are affected by the closure of the respective roads (the number of affected drivers is also called travel demand coverage). To this end,roads with maximum travel demand coverage are sorted as the most critical ones, for which a method known as "link analysis"--is adopted. In an iterative process, first, a road with highest traffic volume is first labeled as "target link," and second, a portion of travel demand which is captured by the target link is excluded from travel demand. For the next iteration, the trimmed travel demand is then assigned to the network where all links including the target links run on the initial travel times. The process carries on until all links are labeled. The proposed methodology is applied to a large- sized network of Winnipeg, Canada. The results shed light on also bottleneck points of the network which may warrant provision of additional capacity or parallel roads.
基金the National Natural Science Foundation of China (No.60435020, 60302021).
文摘Popularity of blogs and the amount of information in the blogosphere increase so fast that it is difficult for Internet users to search the information they care about. Compared with conventional webs,links in the blogosphere are more abundant and conversations between bloggers are more fre-quent. This paper proposes a method of ranking bloggers based on link analysis,which can exemplify the characteristics of blogs,and reduce the influence of link spamming. This method can also bring convenience to users to read blogs,and it can supply a new methodology for information retrieval in the blogosphere. To ensure the reliability of the ranking results,some evaluation indicators of the im-portant bloggers are proposed,and the grading results of bloggers using the proposed method is compared with that using other indicators. At last,correlation analysis is used to verify the consistency between the proposed method and the evaluation indicators.
基金the National Natural Science Foundation of China(No.61701300)the Shanghai Sailing(YANG FAN)Program(No.17YF1409900)HAN Chong’s"Chenguang Program"Supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission
文摘Broadband satellite communications can enable a plethora of applications in customer services, global nomadic coverage and disaster prediction and recovery. Terahertz(THz) band is envisioned as a key satellite communication technology due to its very broad bandwidth, astrophysical observation advantages and device maturing in recent years. In this paper, a massive-antenna-array-enabled THz satellite communication system is proposed to be established in Tanggula, Tibet, where the average altitude is 5.068 km and the mean-clear-sky precipitable water vapor(PWV) is as low as 1.31 mm. In particular, a link budget analysis(LBA) framework is developed for THz space communications, considering unique THz channel properties and massive antenna array techniques. Moreover, practical siting conditions are taken into account, including the altitude, PWV, THz spectral windows, rain and cloud factors. On the basis of the developed link budget model, the massive antenna array model, and the practical parameters in Tanggula, the performances of signal-to-noise ratio(SNR) and capacity are evaluated. The results illustrate that 1 Tbit/s is attainable in the 0.275~0.37 THz spectral window in Tanggula, by using an antenna array of the size 64.
基金Supported by the National Pre-research Project (513150601)
文摘Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.