Region detection and matching for object recognition
MetadataShow full item record
In this thesis, I explore region detection and consider its impact on image matching for exemplar-based object recognition. Detecting regions is important to provide semantically meaningful spatial cues in images. Matching establishes similarity between visual entities, which is crucial for recognition. My thesis starts by detecting regions in both local and object level. Then, I leverage geometric cues of the detected regions to improve image matching for the ultimate goal of object recognition. More specifically, my thesis considers four key questions: 1) how can we extract distinctively-shaped local regions that also ensure repeatability for robust matching? 2) how can object-level shape inform bottom-up image segmentation? 3) how should the spatial layout imposed by segmented regions influence image matching for exemplar-based recognition? and 4) how can we exploit regions to improve the accuracy and speed of dense image matching? I propose novel algorithms to tackle these issues, addressing region-based visual perception from low-level local region extraction, to mid-level object segmentation, to high-level region-based matching and recognition. First, I propose a Boundary Preserving Local Region (BPLR) detector to extract local shapes. My approach defines a novel spanning-tree based image representation whose structure reflects shape cues combined from multiple segmentations, which in turn provide multiple initial hypotheses of the object boundaries. Unlike traditional local region detectors that rely on local cues like color and texture, BPLRs explicitly exploit the segmentation that encodes global object shape. Thus, they respect object boundaries more robustly and reduce noisy regions that straddle object boundaries. The resulting detector yields a dense set of local regions that are both distinctive in shape as well as repeatable for robust matching. Second, building on the strength of the BPLR regions, I develop an approach for object-level segmentation. The key insight of the approach is that objects shapes are (at least partially) shared among different object categories--for example, among different animals, among different vehicles, or even among seemingly different objects. This shape sharing phenomenon allows us to use partial shape matching via BPLR-detected regions to predict global object shape of possibly unfamiliar objects in new images. Unlike existing top-down methods, my approach requires no category-specific knowledge on the object to be segmented. In addition, because it relies on exemplar-based matching to generate shape hypotheses, my approach overcomes the viewpoint sensitivity of existing methods by allowing shape exemplars to span arbitrary poses and classes. For the ultimate goal of region-based recognition, not only is it important to detect good regions, but we must also be able to match them reliably. A matching establishes similarity between visual entities (images, objects or scenes), which is fundamental for visual recognition. Thus, in the third major component of this thesis, I explore how to leverage geometric cues of the segmented regions for accurate image matching. To this end, I propose a segmentation-guided local feature matching strategy, in which segmentation suggests spatial layout among the matched local features within each region. To encode such spatial structures, I devise a string representation whose 1D nature enables efficient computation to enforce geometric constraints. The method is applied for exemplar-based object classification to demonstrate the impact of my segmentation-driven matching approach. Finally, building on the idea of regions for geometric regularization in image matching, I consider how a hierarchy of nested image regions can be used to constrain dense image feature matches at multiple scales simultaneously. Moving beyond individual regions, the last part of my thesis studies how to exploit regions' inherent hierarchical structure to improve the image matching. To this end, I propose a deformable spatial pyramid graphical model for image matching. The proposed model considers multiple spatial extents at once--from an entire image to grid cells to every single pixel. The proposed pyramid model strikes a balance between robust regularization by larger spatial supports on the one hand and accurate localization by finer regions on the other. Further, the pyramid model is suitable for fast coarse-to-fine hierarchical optimization. I apply the method to pixel label transfer tasks for semantic image segmentation, improving upon the state-of-the-art in both accuracy and speed. Throughout, I provide extensive evaluations on challenging benchmark datasets, validating the effectiveness of my approach. In contrast to traditional texture-based object recognition, my region-based approach enables to use strong geometric cues such as shape and spatial layout that advance the state-of-the-art of object recognition. Also, I show that regions' inherent hierarchical structure allows fast image matching for scalable recognition. The outcome realizes the promising potential of region-based visual perception. In addition, all my codes for local shape detector, object segmentation, and image matching are publicly available, which I hope will serve as useful new additions for vision researchers' toolbox.