Flexible and Scalable Storage Management for Data-intensive Stream Processing
Authors
- Irina Botan (ETH Zurich, Switzerland)
- Gustavo Alonso (ETH Zurich, Switzerland)
- Peter Fischer (ETH Zurich, Switzerland)
- Donald Kossmann (ETH Zurich, Switzerland)
- Nesime Tatbul (ETH Zurich, Switzerland)
Abstract
Data Stream Management Systems (DSMS) operate under strict performance requirements. Key to meeting such requirements is to efficiently handle time-critical tasks such as managing internal states of continuous query operators, traffic on the queues between operators, as well as providing storage support for shared computation and archived data. In this paper, we introduce a general purpose storage management framework for DSMSs that performs these tasks based on a clean, loosely-coupled, and flexible system design that also facilitates performance optimization. An important contribution of the framework is that, in analogy to buffer management techniques in relational database systems, it uses information about the access patterns of streaming applications to tune and customize the performance of the storage manager. In the paper, we first analyze typical application requirements at different granularities in order to identify important tunable parameters and their corresponding values. Based on these parameters, we define a general-purpose storage management interface. Using the interface, a developer can use our SMS (Storage Manager for Streams) to generate a customized storage manager for streaming applications. We explore the performance and potential of SMS through a set of experiments using the Linear Road benchmark.
Session
EDBT Research Session 26: Potpourri (Thursday, March 26, 14:00—15:30)