Browsing by Subject "RoboCup"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Cooperation and communication in multiagent deep reinforcement learning(2016-12) Hausknecht, Matthew John; Stone, Peter, 1971-; Ballard, Dana; Mooney, Ray; Miikkulainen, Risto; Singh, SatinderReinforcement learning is the area of machine learning concerned with learning which actions to execute in an unknown environment in order to maximize cumulative reward. As agents begin to perform tasks of genuine interest to humans, they will be faced with environments too complex for humans to predetermine the correct actions using hand-designed solutions. Instead, capable learning agents will be necessary to tackle complex real-world domains. However, traditional reinforcement learning algorithms have difficulty with domains featuring 1) high-dimensional continuous state spaces, for example pixels from a camera image, 2) high-dimensional parameterized-continuous action spaces, 3) partial observability, and 4) multiple independent learning agents. We hypothesize that deep neural networks hold the key to scaling reinforcement learning towards complex tasks. This thesis seeks to answer the following two-part question: 1) How can the power of Deep Neural Networks be leveraged to extend Reinforcement Learning to complex environments featuring partial observability, high-dimensional parameterized-continuous state and action spaces, and sparse rewards? 2) How can multiple Deep Reinforcement Learning agents learn to cooperate in a multiagent setting? To address the first part of this question, this thesis explores the idea of using recurrent neural networks to combat partial observability experienced by agents in the domain of Atari 2600 video games. Next, we design a deep reinforcement learning agent capable of discovering effective policies for the parameterized-continuous action space found in the Half Field Offense simulated soccer domain. To address the second part of this question, this thesis investigates architectures and algorithms suited for cooperative multiagent learning. We demonstrate that sharing parameters and memories between deep reinforcement learning agents fosters policy similarity, which can result in cooperative behavior. Additionally, we hypothesize that communication can further aid cooperation, and we present the Grounded Semantic Network (GSN), which learns a communication protocol grounded in the observation space and reward function of the task. In general, we find that the GSN is effective on domains featuring partial observability and asymmetric information. All in all, this thesis demonstrates that reinforcement learning combined with deep neural network function approximation can produce algorithms capable of discovering effective policies for domains with partial observability, parameterized-continuous actions spaces, and sparse rewards. Additionally, we demonstrate that single agent deep reinforcement learning algorithms can be naturally extended towards cooperative multiagent tasks featuring learned communication. These results represent a non-trivial step towards extending agent-based AI towards complex environments.Item Learning in simulation for real robots(2012-05) Farchy, Alon; Stone, Peter, 1971-; Ballard, Dana H.Simulation is often used in research and industry as a low cost, high efficiency alternative to real model testing. Simulation has also been used to develop and test powerful learning algorithms. However, optimized values in simulation do not translate directly to optimized values in application. In fact, heavy optimization in simulation is likely to exploit the simplifications made in simulation. This observation brings to question the utility of learning in simulation. The UT Austin Villa 3D Simulation Team developed an optimization framework on which a robot agent was trained to maximize the speed of an omni-directional walk. The resulting agent won first place in the 2011 RoboCup 3D Simulation League. This thesis presents the adaptation of this optimization framework to learn parameters in simulation that improved the forward walk speed of the real Aldebaran Nao. By constraining the simulation with tree models learned from the real robot, and manually guiding the optimization based on testing on the real robot, the Nao's walk speed was improved by about 30%.