Robust scene recognition strategy (SRS)

Navigation is one of the most basic abilities of most animals. For navigation to be successful, the ability to recognise previously visited scene is essential - potential food sources and dangers as well as migration routes - are a few examples where recognising a place or scene is crucial for the survivability of a species.

In mobile robotics, the ability to recognise a visited place is known as the 'loop-closure detection', in the context of SLAM (Simultaneous Localisation and Mapping)  - so named as the robot needs to perform scene recognition at the end of a loop so that the uncertainty linked to its current position will not grow out of bounds. The inability to detect loop closure will mean that the robot is essentially lost. Once again, the importance of scene recognition for autonomous robotic navigation is highlighted.

Accurate scene recognition is surprisingly difficult. Shown below are four typical scenes with various forms of distortions that make scene recognition challenging. Typical distortions shown below are viewpoint changes, changes due to natural erosion and clutter and variation in illumination caused by shadows and foliage.

Scene recognition or Loop-closure detection can be achieved by a variety of methods and this depends on the kinds of sensors used. Using vision for scene recognition is the main aim of this research. Inspired by the apparent ease at which certain types of flying insects - bees and wasps - navigate through complex and often dynamically changing natural environments, this research proposes a novel way to achieve scene recognition by the use of ordinal measures of landmark configuration. This work contains three important components - the use of Visual Saliency for landmark selection; the use of SURF (Speeded Up Robust Features) for efficient encoding of the landmark; and the introduction of a similarity metric using ordinal measures of rank correlations to determine the similarity between two given scenes. A simple decision module, based on the statistics from the computed similarity metric, is also proposed by estimating an adaptive decision threshold to accept or reject the input query scene.


The database is organized in terms of the four habitats and each is in turn divided into the reference scenes, the positive and the negative test scenes. Each scene has two frames named as *l.jpg (left frame) and *r.jpg (right frame).

Image database (zipped file) - 52Mb