A Framework For Supporting Quality Of Service Requirements In A Data Stream Management System
Currently, a large class of data-intensive applications, in which data are presented in the form of continuous data streams rather than static relations, has been widely recognized in the database community. Not only is the size of the data for these applications unbounded and the data arrives in a highly bursty mode, but these applications have to conform to Quality of Service (QoS) requirements for processing continuous queries (CQs) over data streams. These characteristics make it infeasible to simply load the arriving data streams into a traditional database management system and use currently available techniques for their processing. Therefore, a data stream management system (DSMS) is needed to process continuous streaming data effectively and efficiently. In this thesis, we discuss and provide solutions to many aspects of a DSMS with the emphasis on supporting QoS requirements and event and rule processing. Specifically, we address the following problems: 1) System Capacity Planning and QoS Metrics Estimation: We propose a queueing theory based model for analyzing a multiple continuous query processing system. Using our queueing model, we provide a solution to the system capacity planning problem and its reverse problem: given the resources and CQs, how to estimate QoS metrics? The estimated QoS metrics not only can be used to verify whether the defined QoS requirements of CQs in a DSMS have been satisfied, but also form the base in a DSMS to manage and control various QoS delivery mechanisms such as scheduling strategies, load shedding, admission control, and others. 2) Run-Time Resource Allocation (Scheduling Strategies): We propose a family of scheduling strategies for run-time resource allocation in DSMSs, which includes the operator path capacity strategy (PC) to minimize the overall tuple latency, the operator segment strategy and its variances: the memory-optimal segment strategy (MOS), which minimizes the total memory requirement, and the simplified segment strategy, and the threshold strategy, a hybrid of the PC and the MOS strategy. 3) QoS Delivery Mechanism (Load Shedding): We develop a set of comprehensive techniques to handle the bursty nature of input data streams by activating/deactivating a set of shedders to gracefully discard tuples during overload periods in a general DSMS. We first formalize the problem and discuss the physical implementation of shedders. We then develop a set of algorithms to estimate system capacity, to compute the optimal location of shedders in CQs, and to allocate the total shedding load among non-active shedders. 4) Event and Rule Processing We develop an integrated model, termed Estream, to process complicated event expressions and rules under the context of data stream processing through a group of enhancements to a DSMS. Our Estream model greatly improves the expressiveness and computation ability of DSMSs in terms of processing complex real-life events and makes DSMSs actively respond to defined events over data streams and carry out defined sequences of actions automatically. Our algorithms and solutions developed in this thesis can be used individually to assist QoS support in a DSMS. The most important contribution of this thesis is that these algorithms and solutions form a framework for supporting QoS requirements QoS requirements in a general DSMS. Finally, the theoretical analysis is validated using a prototype implementation. We have prototyped the proposed solutions, algorithms, and techniques developed in this thesis in a general DSMS, termed MavStream, in C++.