Dynamic Scene Interpretation And Understanding From Two Views




Journal Title

Journal ISSN

Volume Title


Electrical Engineering


Interpretation of a static or dynamic scene starts by segmenting the scene followed by recognition. Our work concentrates on the general problem of segmenting and recognizing animate and inanimate objects in a scene captured from two different views. The two views here refer to either a pair of frames captured by a stereo camera or two frames (with spatial overlap) captured with a moving camera.The work described in the dissertation starts with an iterative split-and-merge framework for segmentation of an unknown number of objects captured with stereo camera. The disparity of a scene is modeled by approximating various surfaces in the scene to be planar. In the split phase, the number of planar surfaces along with the underlying plane parameters is assumed to be known from the initialization or from the previous merge phase. Based on these parameters, planar surfaces in the disparity image are labeled to minimize the residuals between the actual disparity and the modeled disparity. The labeled planar surfaces are separated into spatially continuous regions which are treated as candidates for the merging that follows. The regions are merged together under a maximum variance constraint while maximizing the merged area. A multi-stage branch-and-bound algorithm is proposed to carry out this optimization efficiently.For moving objects, a framework is proposed for two-view multiple structure-and-motion segmentation. This segmentation problem has three unknowns namely the memberships, corresponding fundamental matrices and the number of objects. To handle this otherwise recursive problem, hypotheses for fundamental matrices are generated through local sampling. Once the hypotheses are available, a combinatorial selection problem is formulated to optimize a model selection cost which takes into account the hypotheses likelihoods and the model complexity. An explicit model for the outliers is also added for a robust model selection. The model selection cost is minimized through a branch-and-bound procedure.Followed by segmentation, object recognition was applied to understand the scene. The segmented objects lack exact boundaries; thus shape based recognition or classification will not perform well. We follow a more general approach of visual object recognition instead. Visual object recognition relies on spatial image features to identify the objects. The state of the art visual object recognition approaches use a visual bag-of-words to represent images. Bag-of-features is an orderless collection of invariantly detectable image patches. The approach discards spatial relationships between these patches and, gives objects, their context and the background clutter equal importance. In a modification to the original visual bag-of-words, separate representations for positively and negatively relevant image patches are formed. Improvements in the classification accuracies due to the separation are demonstrated through experimentation.