Parsimonious Temporal Aggregation
Authors
- Juozas Gordevicius (Free University of Bozen-Bolzano, Italy)
- Johann Gamper (Free University of Bozen-Bolzano, Italy)
- Michael Boehlen (Free University of Bozen-Bolzano, Italy)
Abstract
Temporal aggregation is a crucial operator in temporal data-bases and has been studied in various flavors, including instant temporal aggregation (ITA) and span temporal aggregation (STA), each having its strengths and weaknesses. In this paper we define a new temporal aggregation operator, called parsimonious temporal aggregation (PTA), which comprises two main steps: (i) it computes the ITA result over the input relation and (ii) it compresses this intermediate result to a user-specified size c by merging adjacent tuples and keeping the induced total error minimal; the compressed ITA result is returned as the final result. By considering the distribution of the input data and allowing to control the result size, PTA combines the best features of ITA and STA. We provide two evaluation algorithms for PTA queries. First, the oPTA algorithm computes an exact solution, by applying dynamic programming to explore all possibilities to compress the ITA result and selecting the compression with the minimal total error. It runs in O(n^2 p c) time and O(n^2) space, where n is the size of the input relation and p is the number of aggregation functions used. Second, the more efficient gPTA algorithm computes an approximate solution by greedily merging the most similar ITA result tuples, which, however, does not guarantee a compression with a minimal total error. gPTA intermingles the two steps of PTA and avoids large intermediate results. The compression step of gPTA runs in O(n p log (c+delta)) time and O(c + delta) space, where delta is a small buffer for “look ahead”. An empirical evaluation on both synthetic and real world data shows promising results: considerable reductions of the result size introduce only small errors, and gPTA is scalable for large data sets and only slightly worse than the exact solution of PTA.
Session
EDBT Research Session 28: Spatio-Temporal (Thursday, March 26, 16:00—17:30)