期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
O2iJoin: An Efficient Index-Based Algorithm for Overlap Interval Join 被引量:1
1
作者 ji-zhou luo Sheng-Fei Shi +2 位作者 Guang Yang Hong-Zhi Wang Jian-Zhong Li 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第5期1023-1038,共16页
Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based ... Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B+-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index (i.e., O2i Index). It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms. 展开更多
关键词 overlap interval join temporal relation overlap inverted index join algorithm
原文传递
FrepJoin:an efficient partition-based algorithm for edit similarity join
2
作者 ji-zhou luo Sheng-fei SHI +1 位作者 Hong-zhi WANG Jian-zhong LI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1499-1510,共12页
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and... String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets. 展开更多
关键词 String similarity join Edit distance Filter and refine Data partition Combined frequency vectors
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部