In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tup...In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.展开更多
基金The National High Technology Research and Development Program of China(No.2007AA01Z309)the National Natural Science Foundation of China(No.60803160,No.60873030)
文摘In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.