Efficient approaches in network inference

Ray, Avik

Efficient approaches in network inference

Date

2016-12

Authors

Ray, Avik

Abstract

Network based inference is almost ubiquitous in modern machine learning applications. In this dissertation we investigate several such problems motivated by applications in social networks, biological networks, recommendation system, targeted advertising etc. Unavailability of the graph, presence of latent factors, and large network size often make these inference tasks challenging. We develop both generative models and efficient algorithms to solve such problems. We provide analytical guarantees, in terms of accuracy and computation time, for all our algorithms and demonstrate their applicability on many real datasets. This dissertation mainly consists of two parts. In the first part we consider three different problems. We first consider the task of learning the Markov network structure in a discreet graphical model. We develop three fast greedy algorithms to solve this problem which succeeds even in graphs with strong non-neighbor interaction where previous convex optimization based methods fail. Next we consider the problem of learning latent user interests in different topics, using cascades which spread over a network. Our new algorithm infers both user interests and topics in large cascades, better than standard topic modeling algorithms which do not consider the network structure. In the third problem we develop a novel recursive algorithm based on convex relaxation to detect overlapping communities in a graph. The second part of the dissertation develops a mathematical framework to handle different sources of side information and use it to improve inference in networks. However first we demonstrate a much general technique to incorporate variety of side information in estimating a single component of a mixture model e.g. Gaussian mixture model, latent Dirichlet allocation, subspace clustering, and mixed linear regression. We then use a similar technique to solve the problem of identifying a single target community in a graph, using reference nodes or biased node weights as side information. Our algorithms are based on a variant of method of moments, and are much faster and more accurate than other unsupervised and semi-supervised algorithms.