Reinforcement learning in high-diameter, continuous environments
Many important real-world robotic tasks have high diameter, that is, their solution requires a large number of primitive actions by the robot. For example, they may require navigating to distant locations using primitive motor control commands. In addition, modern robots are endowed with rich, high-dimensional sensory systems, providing measurements of a continuous environment. Reinforcement learning (RL) has shown promise as a method for automatic learning of robot behavior, but current methods work best on lowdiameter, low-dimensional tasks. Because of this problem, the success of RL on real-world tasks still depends on human analysis of the robot, environment, and task to provide a useful set of perceptual features and an appropriate decomposition of the task into subtasks. This thesis presents Self-Organizing Distinctive-state Abstraction (SODA) as a solution to this problem. Using SODA a robot with little prior knowledge of its sensorimotor system, environment, and task can automatically reduce the effective diameter of its tasks. First it uses a self-organizing feature map to learn higher level perceptual features while exploring using primitive, local actions. Then, using the learned features as input, it learns a set of high-level actions that carry the robot between perceptually distinctive states in the environment. Experiments in two robot navigation environments demonstrate that SODA learns useful features and high-level actions, that using these new actions dramatically speeds up learning for high-diameter navigation tasks, and that the method scales to large (buildingsized) robot environments. These experiments demonstrate SODAs effectiveness as a generic learning agent for mobile robot navigation, pointing the way toward developmental robots that learn to understand themselves and their environments through experience in the world, reducing the need for human engineering for each new robotic application.