Top K Query Processing In Distributed Database

Top K Query Processing In Distributed Database

Date

2007-09-19T21:53:12Z

Publisher

Computer Science & Engineering

Abstract

Today's data is rarely stored in centralized location due to the enormous amount of information that needs to be stored and also to increase reliability, availability and performance of the system. Same data is stored in different format into different company's database as well as they may be partitioned or replicated. We consider various scenarios of distributed database such as horizontal, vertical fragmentation and attribute overlapping. Allowing access to integrated information from these multiple datasets can provide accurate and wholesome information to the end-user. We research on efficient querying to these distributed databases to get top k elements matching the ranking order provided by the user. We also discuss hierarchical way of using the top k algorithm and their limitations to our problem. We propose four different algorithms based on NRA algorithm to solve this problem efficiently and compare and contrast these methods. Once the combination of data sources has been identified, we use our algorithms to get the top elements from these data source combination, process them to get the top k elements according to the user's ranking function.