Browsing by Subject "POMDP"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Cooperation and communication in multiagent deep reinforcement learning(2016-12) Hausknecht, Matthew John; Stone, Peter, 1971-; Ballard, Dana; Mooney, Ray; Miikkulainen, Risto; Singh, SatinderReinforcement learning is the area of machine learning concerned with learning which actions to execute in an unknown environment in order to maximize cumulative reward. As agents begin to perform tasks of genuine interest to humans, they will be faced with environments too complex for humans to predetermine the correct actions using hand-designed solutions. Instead, capable learning agents will be necessary to tackle complex real-world domains. However, traditional reinforcement learning algorithms have difficulty with domains featuring 1) high-dimensional continuous state spaces, for example pixels from a camera image, 2) high-dimensional parameterized-continuous action spaces, 3) partial observability, and 4) multiple independent learning agents. We hypothesize that deep neural networks hold the key to scaling reinforcement learning towards complex tasks. This thesis seeks to answer the following two-part question: 1) How can the power of Deep Neural Networks be leveraged to extend Reinforcement Learning to complex environments featuring partial observability, high-dimensional parameterized-continuous state and action spaces, and sparse rewards? 2) How can multiple Deep Reinforcement Learning agents learn to cooperate in a multiagent setting? To address the first part of this question, this thesis explores the idea of using recurrent neural networks to combat partial observability experienced by agents in the domain of Atari 2600 video games. Next, we design a deep reinforcement learning agent capable of discovering effective policies for the parameterized-continuous action space found in the Half Field Offense simulated soccer domain. To address the second part of this question, this thesis investigates architectures and algorithms suited for cooperative multiagent learning. We demonstrate that sharing parameters and memories between deep reinforcement learning agents fosters policy similarity, which can result in cooperative behavior. Additionally, we hypothesize that communication can further aid cooperation, and we present the Grounded Semantic Network (GSN), which learns a communication protocol grounded in the observation space and reward function of the task. In general, we find that the GSN is effective on domains featuring partial observability and asymmetric information. All in all, this thesis demonstrates that reinforcement learning combined with deep neural network function approximation can produce algorithms capable of discovering effective policies for domains with partial observability, parameterized-continuous actions spaces, and sparse rewards. Additionally, we demonstrate that single agent deep reinforcement learning algorithms can be naturally extended towards cooperative multiagent tasks featuring learned communication. These results represent a non-trivial step towards extending agent-based AI towards complex environments.Item Feedback-based Information Roadmap (FIRM): Graph-based Estimation and Control of Robotic Systems Under Uncertainty(2014-05-07) Aghamohammadi, AliakbarThis dissertation addresses the problem of stochastic optimal control with imperfect measurements. The main application of interest is robot motion planning under uncertainty. In the presence of process uncertainty and imperfect measurements, the system's state is unknown and a state estimation module is required to provide the information-state (belief), which is the probability distribution function (pdf) over all possible states. Accordingly, successful robot operation in such a setting requires reasoning about the evolution of information-state and its quality in future time steps. In its most general form, this is modeled as a Partially-Observable Markov Decision Process (POMDP) problem. Unfortunately, however, the exact solution of this problem over continuous spaces in the presence of constraints is computationally intractable. Correspondingly, state-of-the-art methods that provide approximate solutions are limited to problems with short horizons and small domains. The main challenge for these problems is the exponential growth of the search tree in the information space, as well as the dependency of the entire search tree on the initial belief. Inspired by sampling-based (roadmap-based) methods, this dissertation proposes a method to construct a "graph" in information space, called Feedback-based Information RoadMap (FIRM). Each FIRM node is a probability distribution and each FIRM edge is a local controller. The concept of belief stabilizers is introduced as a way to steer the current belief toward FIRM nodes and induce belief reachability. The solution provided by the FIRM framework is a feedback law over the information space, which is obtained by switching among locally distributed feedback controllers. Exploiting such a graph in planning, the intractable POMDP problem over continuous spaces is reduced to a tractable MDP (Markov Decision Process) problem over the graph (FIRM) nodes. FIRM is the first graph generated in the information space that preserves the principle of optimality, i.e., the costs associated with different edges of FIRM are independent of each other. Unlike the forward search methods on tree-structures, the plans produced by FIRM are independent of the initial belief (i.e., plans are query-independent). As a result, they are robust and reliable. They are robust in the sense that if the system's belief deviates from the planned belief, then replanning is feasible in real-time, as the computed solution is a feedback over the entire belief graph. Computed plans are reliable in the sense that the probability of violating constraints (e.g., hitting obstacles) can be seamlessly incorporated into the planning law. Moreover, FIRM is a scalable framework, as the computational complexity of its construction is linear in the size of underlying graph as opposed to state-of-the-art methods whose complexity is exponential in the size of underlying graph. In addition to the abstract framework, we present concrete FIRM instantiations for three main classes of robotic systems: holonomic, nonholonomic, and non-pointstabilizable. The abstract framework opens new avenues for extending FIRM to a broader class of systems that are not considered in this dissertation. This includes systems with discrete dynamics or in general systems that are not well-linearizable, systems with non-Gaussian distributions, and systems with unobservable modes. In addition to the abstract framework and concrete instantiations of it, we propose a formal technique for replanning with FIRM based on a rollout-policy algorithm to handle changes in the environment as well as discrepancies between actual and computational models. We demonstrate the performance of the proposed motion planning method on different robotic systems, both in simulation and on physical systems. In the problems we consider, the system is subject to motion and sensing noise. Our results demonstrate a significant advance over existing approaches for motion planning in information space. We believe the proposed framework takes an important step toward making information space planners applicable to real world robotic applications.Item Joint maintenance and production operations decision making in flexible manufacturing systems(2016-08) Celen, Merve; Djurdjanovic, Dragan; Chen, Frank; Hasenbein, John J; Kutanoglu, Erhan; Morton, David PIn highly flexible and highly integrated manufacturing systems such as semiconductor manufacturing, equipment has the capability of conducting different manufacturing operations and/or producing at various speeds. In such systems, degradation of a machine depends highly on the operations performed on it. Selection of operations executed on an equipment changes the degradation dynamics and hence directly affects preventive maintenance (PM) decisions. On the other hand, PM actions interrupt production and change the system reliability and equipment availability, which in turn directly affects decisions as to which operations should be performed on which piece of equipment. These strong dynamic interactions between equipment condition, operations executed on the equipment and product quality necessitate a methodology that integrates the decisions of maintenance scheduling and production operations. Currently, maintenance and production operations decision-making are two decoupled processes. To address the aforementioned problems, in this dissertation, we devise integrated decision-making policies for maintenance scheduling and production operations in flexible manufacturing systems (FMS) optimizing a customizable objective function that takes into account operation-dependent degradation models and production targets. The objective function consists of costs associated with scheduled and unscheduled maintenance, rewards for successfully completed products and penalties for missed production targets. In order to maximize the objective function, a paradigm based on metaheuristic optimization and evaluation of candidate solutions via discrete-event simulations of operations of the underlying manufacturing system is used. Firstly, we propose an operation-dependent decision-making policy for a multiple-product/multiple-equipment manufacturing system, where each product requires several operations for completion and the sequence in which different product types are produced is a priori given. The proposed method is tested in simulations of a cluster tool and the results show that operation-dependent maintenance decision-making outperforms the case where maintenance decisions are made without considerations of operation-dependent degradation dynamics. Secondly, we propose an integrated decision-making policy for maintenance scheduling and product sequencing where the sequence in which different product types can be arranged in a way to maximize the customizable profit function. The results show that jointly making maintenance and production sequencing decisions consistently and often significantly outperforms the current practice of making these decisions separately. Finally, a joint maintenance scheduling and production operations decision making policy is proposed for a flexible manufacturing system where the degradation states of the equipment are not perfectly observable, but are rather hidden states of a known Hidden Markov Model (HMM). Proposed integrated decision-making policy under imperfect degradation state observations is shown to consistently outperform the benchmark policies.