Optimal visual search strategies using natural scene statistics
Raj, Raghu G., 1975-
MetadataShow full item record
I present theoretical foundations and perform computational studies on optimal search strategies in natural scenes performed by foveated artificial vision systems, based on novel characterizations of Natural Scene Statistics (NSS). I first develop relevant theoretical bounds on the processing of foveated--more generally LSV-filtered (Linear Scale Variant)--signals, which provide a rigorous basis to linear post-processing operations performed on foveated images. The major contribution of this dissertation, however, lies in the discovery and elucidation of two major statistical characterizations of natural scenes and their subsequent deployment for devising optimal fixation strategies. The first is a novel characterization of the contrast statistics of natural scenes, parameterized by the eccentricity at which each contrast level is measured across the LSV-filtered image. This formulation of contrast statistics finds natural application in devising fixation patterns that optimally extract contrast information from the image. I further demonstrate that the resulting fixation patterns are nearly optimal in the sense of minimizing the global MSE of the LSV-filtered image. The second is the characterization of the non-stationary structure of natural images via the development of the concept of non-stationarity indices that measure the extent of non-stationarity across the image. The theoretical motivation of our approach lies in a novel characterization of image patch statistics I developed, called Multilinear Independent Component Analysis (MICA), wherein the statistical interactions between the pseudo-independent components are captured via a multilinear expansion of the joint probability density being modeled. This modeling technique enables the derivation of a theoretical measure of non-stationary in natural scenes that subsequently motivates computationally efficient non-stationarity indices--a variant of which is then deployed to furnish optimal texture-based fixations natural images. The fixation patterns generated by our information-theoretic approaches are quantitatively shown to match very well with human fixation patterns and offer considerable explanatory and predictive power over previously well-known fixation strategies. These results point the way towards a unified information-theoretic understanding of low-level fixation processes; and further demonstrate the importance of incorporating low-level visual information into visual search strategies--thereby providing a foundation upon which high-level visual information relating to scene context and object structures can be incorporated.