A Random Walk Approach To Sampling Hidden Databases
dc.contributor | Dasgupta, Arjun | en_US |
dc.date.accessioned | 2007-08-23T01:56:03Z | |
dc.date.accessioned | 2011-08-24T21:39:43Z | |
dc.date.available | 2007-08-23T01:56:03Z | |
dc.date.available | 2011-08-24T21:39:43Z | |
dc.date.issued | 2007-08-23T01:56:03Z | |
dc.date.submitted | April 2007 | en_US |
dc.description.abstract | A large part of the data on the World Wide Web is hidden behind form-like interfaces. These interfaces interact with a hidden back-end database to provide answers to user queries. Generating a uniform random sample of this hidden database by using only the publicly available interface gives us access to the underlying data distribution. In this thesis, we propose a random walk scheme over the query space provided by the interface to sample such databases. We discuss variants where the query space is visualized as a fixed and random ordering of attributes. We also propose techniques to further improve the sample quality by using a probabilistic rejection based approach and conduct extensive experiments to illustrate the accuracy and efficiency of our techniques. | en_US |
dc.identifier.uri | http://hdl.handle.net/10106/96 | |
dc.language.iso | EN | en_US |
dc.publisher | Computer Science & Engineering | en_US |
dc.title | A Random Walk Approach To Sampling Hidden Databases | en_US |
dc.type | M.S. | en_US |