Scalable kernel methods for machine learning



Journal Title

Journal ISSN

Volume Title



Machine learning techniques are now essential for a diverse set of applications in computer vision, natural language processing, software analysis, and many other domains. As more applications emerge and the amount of data continues to grow, there is a need for increasingly powerful and scalable techniques. Kernel methods, which generalize linear learning methods to non-linear ones, have become a cornerstone for much of the recent work in machine learning and have been used successfully for many core machine learning tasks such as clustering, classification, and regression. Despite the recent popularity in kernel methods, a number of issues must be tackled in order for them to succeed on large-scale data. First, kernel methods typically require memory that grows quadratically in the number of data objects, making it difficult to scale to large data sets. Second, kernel methods depend on an appropriate kernel function--an implicit mapping to a high-dimensional space--which is not clear how to choose as it is dependent on the data. Third, in the context of data clustering, kernel methods have not been demonstrated to be practical for real-world clustering problems. This thesis explores these questions, offers some novel solutions to them, and applies the results to a number of challenging applications in computer vision and other domains. We explore two broad fundamental problems in kernel methods. First, we introduce a scalable framework for learning kernel functions based on incorporating prior knowledge from the data. This frame-work scales to very large data sets of millions of objects, can be used for a variety of complex data, and outperforms several existing techniques. In the transductive setting, the method can be used to learn low-rank kernels, whose memory requirements are linear in the number of data points. We also explore extensions of this framework and applications to image search problems, such as object recognition, human body pose estimation, and 3-d reconstructions. As a second problem, we explore the use of kernel methods for clustering. We show a mathematical equivalence between several graph cut objective functions and the weighted kernel k-means objective. This equivalence leads to the first eigenvector-free algorithm for weighted graph cuts, which is thousands of times faster than existing state-of-the-art techniques while using significantly less memory. We benchmark this algorithm against existing methods, apply it to image segmentation, and explore extensions to semi-supervised clustering.