Browsing by Subject "Semidefinite programming"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Overlapping community detection in massive social networks(2015-12) Whang, Joyce Jiyoung; Dhillon, Inderjit S.; Grauman, Kristen; Mooney, Raymond J; Pingali, Keshav; Gleich, David FMassive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex networks. A community is a set of cohesive vertices that has more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this thesis, we propose scalable overlapping community detection algorithms that effectively identify high quality overlapping communities in various real-world networks. We first develop an efficient overlapping community detection algorithm using a seed set expansion approach. The key idea of this algorithm is to find good seeds and then greedily expand these seeds using a personalized PageRank clustering scheme. Experimental results show that our algorithm significantly outperforms other state-of-the-art overlapping community detection methods in terms of run time, cohesiveness of communities, and ground-truth accuracy. To develop more principled methods, we formulate the overlapping community detection problem as a non-exhaustive, overlapping graph clustering problem where clusters are allowed to overlap with each other, and some nodes are allowed to be outside of any cluster. To tackle this non-exhaustive, overlapping clustering problem, we propose a simple and intuitive objective function that captures the issues of overlap and non-exhaustiveness in a unified manner. To optimize the objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using a low-rank semidefinite programming technique. Our experimental results show that the new objective and the algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness. We extend our non-exhaustive, overlapping clustering techniques to co-clustering where the goal is to simultaneously identify a clustering of the rows as well as the columns of a data matrix. As an example application, consider recommender systems where users have ratings on items. This can be represented by a bipartite graph where users and items are denoted by two different types of nodes, and the ratings are denoted by weighted edges between the users and the items. In this case, co-clustering would be a simultaneous clustering of users and items. We propose a new co-clustering objective function and an efficient co-clustering algorithm that is able to identify overlapping clusters as well as outliers on both types of the nodes in the bipartite graph. We show that our co-clustering algorithm is able to effectively capture the underlying co-clustering structure of the data, which results in boosting the performance of a standard one-dimensional clustering. Finally, we study the design of parallel data-driven algorithms, which enables us to further increase the scalability of our overlapping community detection algorithms. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve a significantly superior scalability than standard PageRank implementations. The design choices affect both single-threaded performance as well as parallel scalability. The lessons learned from this study not only guide efficient implementations of many graph mining algorithms but also provide a framework for designing new scalable algorithms, especially for large-scale community detection.Item Strong lower bounds on generic convex relaxations(2016-08) Kothari, Pravesh Kumar; Klivans, Adam R.; Barak, Boaz; Zuckerman, David; Ravikumar, Pradeep; Price, EricDespite significant successes in understanding the hardness of computational problems based on standard assumptions such as P != NP, there are important settings where the gap between what the best known algorithms achieve and what the best known hardness reductions can rule out is rather stark. This thesis aims at decreasing this gap by proving unconditional lower bounds on powerful algorithmic techniques based on linear (LP) and semi-definite programming (SDP) for Planted Clique and Constraint Satisfaction problems (CSPs). Planted Clique is a central question in average case complexity where the goal is to find a clique of size [omega] planted in the Erdős-Rényi random graph G(n,0.5). While information theoretically, such a clique can be found whenever [omega] >> 2log(n), state-of-the-art polynomial time algorithms succeed only when \omega ~ \sqrt{n}. In fact, the conjectured hardness of detecting planted cliques for \omega << \sqrt{n} has been used as a starting point in many works. We show that algorithms with running time n^{o(log n)} based on the sum-of-squares method cannot detect planted cliques when \omega << \sqrt{n}. Sum-of-Squares method is a general scheme based on SDP that extends natural LP relaxations and spectral methods, captures the best approximation algorithms for fundamental problems and is the prime candidate for refuting the Unique Games Conjecture. Further, we show that any sub-exponential algorithm based on the sum-of-squares method cannot outperform simply returning an assignment at random for any pairwise independent CSP - a large class of CSPs not known be NP-hard to approximate beyond the random assignment threshold. Finally, we show that sub-exponential size (number of inequalities/constraints) LP that encodes MAX-CSP as a linear optimization over any fixed polytope are no more powerful than the well-known Sherali-Adams LP of similar size. This immediately yields a sub-exponential size lower bound for any LP that obtains >0.5 approximation for MaxCut showing an exponential separation between linear and semidefinite programming.