Browsing by Subject "clustering"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item A Biologically Inspired Networking Model for Wireless Sensor Networks(2011-02-22) Charalambous, CharalambosWireless sensor networks (WSNs) have emerged in strategic applications such as target detection, localization, and tracking in battlefields, where the large-scale na- ture renders centralized control prohibitive. In addition, the finite batteries in sensor nodes demand energy-aware network control. In this thesis, we propose an energy- efficient topology management model inspired by biological inter-cellular signaling schemes. The model allows sensor nodes to cluster around imminent targets in a purely distributed and autonomous fashion. In particular, nodes in the target vicinity collaborate to form clusters based on their relative observation quality values. Sub- sequently, the clustered sensor nodes compete based on their energy levels until some of them gain active status while the rest remain idle, again according to a distributed algorithm based on biological processes. A final phase of the model has the active cluster members compete until one of them becomes the clusterhead. We examine the behavior of such a model in both finite-size and infinite-size networks. Specifically, we show that the proposed model is inherently stable and achieves superior energy efficiency against reference protocols for networks of finite size. Furthermore, we dis- cuss the behavior of the model in the asymptotic case when the number of nodes goes to infinity. In this setting, we study the average number of cluster members.Item Causality and aggregation in economics: the use of high dimensional panel data in micro-econometrics and macro-econometrics(2009-05-15) Kwon, Dae-HeumThis study proposes one plausible procedure to address two methodological issues, which are common in micro- and macro- econometric analyses, for the full realization of research potential brought by recently available high dimensional data. To address the issue of how to infer the causal structure from empirical regularities, graphical causal models are proposed to inductively infer causal structure from non-temporal and non-experimental data. However, the (probabilistic) stability condition for the graphical causal models can be violated for high dimensional data, given that close co-movements and thus near deterministic relations are oftentimes observed among variables in high dimensional data. Aggregation methods are proposed as one possible way to address this matter, allowing one to infer causal relationships among disaggregated variables based on aggregated variables. Aggregation methods also are helpful to address the issue of how to incorporate a large information set into an empirical model, given that econometric considerations, such as degrees-of-freedom and multicollinearity, require an economy of parameters in empirical models. However, actual aggregation requires legitimate classifications for interpretable and consistent aggregation. Based on the generalized condition for the consistent and interpretable aggregation derived from aggregation theory and statistical dimensional methods, we propose plausible methodological procedure to consistently address the two related issues of causal inference and actual aggregation procedures. Additional issues for empirical studies of micro-economics and macro-economics are also discussed. The proposed procedure provides an inductive guidance for the specification issues among the direct, inverse, and mixed demand systems and an inverse demand system, which is statistically supported, is identified for the consumer behavior of soft drink consumption. The proposed procedure also provides ways to incorporate large information set into an empirical model with allowing structural understanding of U.S. macro-economy, which was difficult to obtain based on the previously used factor augmented vector autoregressive (FAVAR) framework. The empirical results suggest the plausibility of the proposed method to incorporate large information sets into empirical studies by inductively addressing multicollinearity problem in high dimensional data.Item Clustering and Inconsistent Information: A Kernelization Approach(2012-07-16) Cao, YixinClustering is the unsupervised classification of patterns into groups, which is easy provided the data of patterns are consistent. However, real data are almost always tempered with inconsistencies, which make it a hard problem, and actually, the most widely studied formulations, correlation clustering and hierarchical clustering, are both NP-hard. In the graph representation of data, inconsistencies also frequently present themselves as cycles, also called deadlocks, and to break cycles by removing vertices is the objective of the classical feedback vertex set (FVS) problem. This dissertation studies the three problems, correlation clustering, hierarchical clustering, and disjoint-FVS (a variation of FVS), from a kernelization approach. A kernelization algorithm in polynomial time reduces a problem instance provably to speed up the further processing with other approaches. For each of the problems studied, an efficient kernelization algorithm of linear or sub-quadratic running time is presented. All the kernels obtained in this dissertation have linear size with very small constants. Better parameterized algorithms are also designed based on the kernels for the last two problems. Finally, some concluding remarks on possible directions for future research are briefly mentioned.Item Clustering-Based Simultaneous Task and Voltage Scheduling for NoC Systems(2011-08-08) Yang, YuNetwork-on-Chip (NoC) is emerging as a promising communication structure, which is scalable with respect to chip complexity. Meanwhile, latest chip designs are increasingly leveraging multiple voltage-frequency domains for energy-efficiency improvement. In this work, we propose a simultaneous task and voltage scheduling algorithm for energy minimization in NoC based designs. The energy-latency tradeoff is handled by Lagrangian relaxation. The core algorithm is a clustering based approach which not only assigns voltage levels and starting time to each task (or Processing Element) but also naturally finds voltage-frequency clusters. Compared to a recent previous work, which performs task scheduling and voltage assignment sequentially, our method leads to an average of 20 percent energy reduction.Item Optimization-Based Network Analysis with Applications in Clustering and Data Mining(2013-07-22) Shahinpour, ShahramIn this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research. The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem. Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems.Item Probabilistic Simhash Matching(2012-10-19) Sood, SadhanFinding near-duplicate documents is an interesting problem but the existing methods are not suitable for large scale datasets and memory constrained systems. In this work, we developed approaches that tackle the problem of finding near-duplicates while improving query performance and using less memory. We then carried out an evaluation of our method on a dataset of 70M web documents, and showed that our method works really well. The results indicated that our method could achieve a reduction in space by a factor of 5 while improving the query time by a factor of 4 with a recall of 0.95 for finding all near-duplicates when the dataset is in memory. With the same recall and same reduction in space, we could achieve an improvement in query-time by a factor of 4.5 while finding first the near-duplicate for an in memory dataset. When the dataset was stored on a disk, we could achieve an improvement in performance by 7 times for finding all near-duplicates and by 14 times when finding the first near-duplicate.