Optimization-Based Network Analysis with Applications in Clustering and Data Mining

Date

2013-07-22

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research.

The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem.

Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems.

Description

Citation