Browsing by Subject "Kernel methods"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Large-scale non-linear prediction with applications(2016-08) Si, Si, Ph.D.; Dhillon, Inderjit S.; Grauman, Kristen; Keerthi, Selvaraj Sathiya; Mooney, RaymondWith an immense growth in data, there is a great need for training and testing machine learning models on very large data sets. Several standard non-linear algorithms based on either kernels (e.g., kernel support vector machines and kernel ridge regression) or decision trees (e.g., gradient boosted decision trees) often yield superior predictive performance on various machine learning tasks compared to linear methods; however, they suffer from severe computation and memory challenges when scaling to millions of data instances. To overcome these challenges, we develop a family of scalable kernel-approximation-based and decision-tree-based algorithms to reduce the computational cost of non linear methods in terms of training time, prediction time and memory usage. We further show their superior performance on a wide range of machine learning tasks including large-scale classification, regression, and extreme multi-label learning. In particular, we make the following contributions: (1) We develop a family of memory efficient kernel approximation algorithms by exploiting the structure of kernel matrices. The proposed kernel approximation scheme can significantly speed up the training phase of kernel machines; (2) We make the connection between forming a kernel approximation and predicting new instances using kernel machines, and propose a series of improvements over the classical \Nystrom kernel approximation method. We show that these improvements result in an order of magnitude speed-up in prediction time on large-scale classification and regression tasks with millions of training instances; (3) We overcome the challenges of applying decision trees to the extreme multi-label classification problem, which can have more than 100,000 different labels, and develop the first Gradient Boosting Decision Tree (GBDT) algorithm for extreme multi-label learning. We show that the modified GBDT algorithm achieves substantial reductions in prediction time and model size.Item Unsupervised learning methods: An efficient clustering framework with integrated model selection(2012-08) Corona, Enrique; Nutter, Brian; Mitra, Sunanda; Pal, Ranadip; López-Benitez, NoéClassification is one of the most important practices in data analysis. In the context of machine learning, this practice can be viewed as the problem of identifying representative data patterns in such a manner that coherent groups are formed. If the data structure is readily available (e.g. supervised learning), it is usually used to establish classification rules for discrimination. However, when the data is unlabeled, its underlying structure must be unveiled first. Consequently, unsupervised classification poses more challenges. Among them, the fundamental question of an appropriate number of groups or clusters in the data must be addressed. In this context, the "jump" method, an efficient but limited linear approach that finds plausible answers to the number of clusters in a dataset, is improved via the optimization of an appropriate objective function that quantifies the quality of particular cluster configurations. Recent developments showing interesting associations between spectral clustering (SC) and kernel principal component analysis (KPCA) are used to extend the improved method to the non-linear domain. This is achieved by mapping the input data to a new space where the original clusters appear as linear structures. The characteristics of this mapping depend to a large extent on the parameters of the kernel function selected. By projecting these linear structures to the unit sphere, the proposed method is able to measure the quality of the resulting cluster configurations. These quality scores aid in the simultaneous decision of the kernel parameters (i.e. model selection) and the number of clusters present in the dataset. Results of the enhanced jump method are compared to other relative validation criteria such as minimum description length (MDL), Akaike's information criterion (AIC) and consistent Akaike's information criterion (CAIC). The extension of the method is tested with other cluster validity indices, in similar settings, such as the adjusted Rand index (ARI) and the balanced line fit (BLF). Finally, image segmentation examples are shown as a real world application of the technique.