Browsing by Subject "Reinforcement learning"
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Artificial intelligence in computer security: Detection, temporary repair and defense(2012-05) Randrianasolo, Arisoa; Pyeatt, Larry D.; Mengel, Susan A.; Rushton, J. Nelson; Lim, SunhoComputer security system providers are unable to provide timely security updates. Most security systems are not designed to be adaptive to the increasing number of new threats. Companies lose considerable amount of time and resources when security attacks manifest themselves. As an answer to these problems, this research is aimed at developing security systems capable of learning and updating themselves. The goal is to create security systems that will autonomously mature with exposure to threats over time. To achieve this goal, this research is exploring learning techniques from the Artificial Intelligence field. This research is proposing artificial intelligence based security systems with learning capability to perform intrusion detection, temporary repair and diagnostics, and defending a network. For network intrusion detection, this research is proposing the utilization of an Artificial Immune System based on Holland's Classifier. A Q-learning approach is proposed to provide a self learning temporary repair and diagnostic mechanism for attacked software. Finally, a General Game Player approach is used as a network defender designed to fight unknown attackers. These approaches are trained and tested with simulations employing DARPA's dataset. Despite the need for an initial training time and the massive use of memory, these approaches appear to have the ability to learn and are in close competition with the other approaches that were tested on the same dataset.Item Automated domain analysis and transfer learning in general game playing(2010-08) Kuhlmann, Gregory John; Stone, Peter, 1971-; Lifschitz, Vladimir; Mooney, Raymond J.; Porter, Bruce W.; Schaeffer, JonathanCreating programs that can play games such as chess, checkers, and backgammon, at a high level has long been a challenge and benchmark for AI. Computer game playing is arguably one of AI's biggest success stories. Several game playing systems developed in the past, such as Deep Blue, Chinook and TD-Gammon have demonstrated competitive play against the top human players. However, such systems are limited in that they play only one particular game and they typically must be supplied with game-specific knowledge. While their performance is impressive, it is difficult to determine if their success is due to generally applicable techniques or due to the human game analysis. A general game player is an agent capable of taking as input a description of a game's rules and proceeding to play without any subsequent human input. In doing so, the agent, rather than the human designer, is responsible for the domain analysis. Developing such a system requires the integration of several AI components, including theorem proving, feature discovery, heuristic search, and machine learning. In the general game playing scenario, the player agent is supplied with a game's rules in a formal language, prior to match play. This thesis contributes a collection of general methods for analyzing these game descriptions to improve performance. Prior work on automated domain analysis has focused on generating heuristic evaluation functions for use in search. The thesis builds upon this work by introducing a novel feature generation method. Also, I introduce a method for generating and comparing simple evaluation functions based on these features. I describe how more sophisticated evaluation functions can be generated through learning. Finally, this thesis demonstrates the utility of domain analysis in facilitating knowledge transfer between games for improved learning speed. The contributions are fully implemented with empirical results in the general game playing system.Item Autonomous inter-task transfer in reinforcement learning domains(2008-08) Taylor, Matthew Edmund; Stone, Peter, 1971-Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents’ experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL’s applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents inter-task mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Inter-task mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same inter-task mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how inter-task mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mapping-learning algorithms for RL transfer. Combining transfer methods with these similarity-learning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent’s ability to learn in each domain, and explore the limits of transfer’s applicability.Item Autonomous qualitative learning of distinctions and actions in a developing agent(2010-08) Mugan, Jonathan William; Kuipers, Benjamin; Stone, Peter, 1971-; Ballard, Dana; Cohen, Leslie; Mooney, RaymondHow can an agent bootstrap up from a pixel-level representation to autonomously learn high-level states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.Item Autonomous trading in modern electricity markets(2015-12) Urieli, Daniel; Stone, Peter, 1971-; Mooney, Raymond; Ravikumar, Pradeep; Baldick, Ross; Kolter, ZicoThe smart grid is an electricity grid augmented with digital technologies that automate the management of electricity delivery. The smart grid is envisioned to be a main enabler of sustainable, clean, efficient, reliable, and secure energy supply. One of the milestones in the smart grid vision will be programs for customers to participate in electricity markets through demand-side management and distributed generation; electricity markets will (directly or indirectly) incentivize customers to adapt their demand to supply conditions, which in turn will help to utilize intermittent energy resources such as from solar and wind, and to reduce peak-demand. Since wholesale electricity markets are not designed for individual participation, retail brokers could represent customer populations in the wholesale market, and make profit while contributing to the electricity grid’s stability and reducing customer costs. A retail broker will need to operate continually and make real-time decisions in a complex, dynamic environment. Therefore, it will benefit from employing an autonomous broker agent. With this motivation in mind, this dissertation makes five main contributions to the areas of artificial intelligence, smart grids, and electricity markets. First, this dissertation formalizes the problem of autonomous trading by a retail broker in modern electricity markets. Since the trading problem is intractable to solve exactly, this formalization provides a guideline for approximate solutions. Second, this dissertation introduces a general algorithm for autonomous trading in modern electricity markets, named LATTE (Lookahead-policy for Autonomous Time-constrained Trading of Electricity). LATTE is a general framework that can be instantiated in different ways that tailor it to specific setups. Third, this dissertation contributes fully implemented and operational autonomous broker agents, each using a different instantiation of LATTE. These agents were successful in international competitions and controlled experiments and can serve as benchmarks for future research in this domain. Detailed descriptions of the agents’ behaviors as well as their source code are included in this dissertation. Fourth, this dissertation contributes extensive empirical analysis which validates the effectiveness of LATTE in different competition levels under a variety of environmental conditions, shedding light on the main reasons for its success by examining the importance of its constituent components. Fifth, this dissertation examines the impact of Time-Of-Use (TOU) tariffs in competitive electricity markets through empirical analysis. Time-Of-Use tariffs are proposed for demand-side management both in the literature and in the real-world. The success of the different instantiations of LATTE demonstrates its generality in the context of electricity markets. Ultimately, this dissertation demonstrates that an autonomous broker can act effectively in modern electricity markets by executing an efficient lookahead policy that optimizes its predicted utility, and by doing so the broker can benefit itself, its customers, and the economy.Item Cooperation and communication in multiagent deep reinforcement learning(2016-12) Hausknecht, Matthew John; Stone, Peter, 1971-; Ballard, Dana; Mooney, Ray; Miikkulainen, Risto; Singh, SatinderReinforcement learning is the area of machine learning concerned with learning which actions to execute in an unknown environment in order to maximize cumulative reward. As agents begin to perform tasks of genuine interest to humans, they will be faced with environments too complex for humans to predetermine the correct actions using hand-designed solutions. Instead, capable learning agents will be necessary to tackle complex real-world domains. However, traditional reinforcement learning algorithms have difficulty with domains featuring 1) high-dimensional continuous state spaces, for example pixels from a camera image, 2) high-dimensional parameterized-continuous action spaces, 3) partial observability, and 4) multiple independent learning agents. We hypothesize that deep neural networks hold the key to scaling reinforcement learning towards complex tasks. This thesis seeks to answer the following two-part question: 1) How can the power of Deep Neural Networks be leveraged to extend Reinforcement Learning to complex environments featuring partial observability, high-dimensional parameterized-continuous state and action spaces, and sparse rewards? 2) How can multiple Deep Reinforcement Learning agents learn to cooperate in a multiagent setting? To address the first part of this question, this thesis explores the idea of using recurrent neural networks to combat partial observability experienced by agents in the domain of Atari 2600 video games. Next, we design a deep reinforcement learning agent capable of discovering effective policies for the parameterized-continuous action space found in the Half Field Offense simulated soccer domain. To address the second part of this question, this thesis investigates architectures and algorithms suited for cooperative multiagent learning. We demonstrate that sharing parameters and memories between deep reinforcement learning agents fosters policy similarity, which can result in cooperative behavior. Additionally, we hypothesize that communication can further aid cooperation, and we present the Grounded Semantic Network (GSN), which learns a communication protocol grounded in the observation space and reward function of the task. In general, we find that the GSN is effective on domains featuring partial observability and asymmetric information. All in all, this thesis demonstrates that reinforcement learning combined with deep neural network function approximation can produce algorithms capable of discovering effective policies for domains with partial observability, parameterized-continuous actions spaces, and sparse rewards. Additionally, we demonstrate that single agent deep reinforcement learning algorithms can be naturally extended towards cooperative multiagent tasks featuring learned communication. These results represent a non-trivial step towards extending agent-based AI towards complex environments.Item Exploration of an unknown space by collective robotics using fuzzy logic and reinforcement learning(Texas Tech University, 2000-05) Pandya, Ashish K.This thesis concerns itself with the specific problem as follows: search an area using mobile robots without the aid of human (or central) tele-operation. The robots must correctly identify the goal source which is characterized by a maximum intensity (or favorability.) Subsequently, they must reach the position of the goal source while incurring a low total cost (energy consumed). The principles with their scalability and usability are used as evaluation criteria for the methods used to explore the unknown search area. Two different approaches are considered to solve this problem. The first, uses fiizzy mles [1], so that a robot in collaboration with other robots may use the knowledge of its present state vectors to find the desired signal source.The second approach uses reinforced leaming technique to train robots. In this technique, we have 3 different methodologies. The first is the simplest reinforcement leaming called QLeaming in which we have a lookup table to train individual robot. Second method is similar to Simple ACD viz: Heuristic Dynamic Programming (HDP) called Temporal Difference (TD(X)) method. The Temporal Difference method is an elegant way of doing reinforcement leaming. A simple ACD uses two neural networks, e.g., a criticand an action (control) network. The critic network leams to predict the total fiiture cost from a given environment to the terminal state, while the action network leams a policy function to optimize critic's cost output at each state. A graphical user interface and display plus a software implemented simulator are used for experimental purposes for both approaches.Item Learning from human-generated reward(2012-12) Knox, William Bradley; Stone, Peter, 1971-; Ballard, Dana; Breazeal, Cynthia; Love, Bradley C; Mooney, Raymond JRobots and other computational agents are increasingly becoming part of our daily lives. They will need to be able to learn to perform new tasks, adapt to novel situations, and understand what is wanted by their human users, most of whom will not have programming skills. To achieve these ends, agents must learn from humans using methods of communication that are naturally accessible to everyone. This thesis presents and formalizes interactive shaping, one such teaching method, where agents learn from real-valued reward signals that are generated by a human trainer. In interactive shaping, a human trainer observes an agent behaving in a task environment and delivers feedback signals. These signals are mapped to numeric values, which are used by the agent to specify correct behavior. A solution to the problem of interactive shaping maps human reward to some objective such that maximizing that objective generally leads to the behavior that the trainer desires. Interactive shaping addresses the aforementioned needs of real-world agents. This teaching method allows human users to quickly teach agents the specific behaviors that they desire. Further, humans can shape agents without needing programming skills or even detailed knowledge of how to perform the task themselves. In contrast, algorithms that learn autonomously from only a pre-programmed evaluative signal often learn slowly, which is unacceptable for some real-world tasks with real-world costs. These autonomous algorithms additionally have an inflexibly defined set of optimal behaviors, changeable only through additional programming. Through interactive shaping, human users can (1) specify and teach desired behavior and (2) share task knowledge when correct behavior is already indirectly specified by an objective function. Additionally, computational agents that can be taught interactively by humans provide a unique opportunity to study how humans teach in a highly controlled setting, in which the computer agent’s behavior is parametrized. This thesis answers the following question. How and to what extent can agents harness the information contained in human-generated signals of reward to learn sequential decision-making tasks? The contributions of this thesis begin with an operational definition of the problem of interactive shaping. Next, I introduce the tamer framework, one solution to the problem of interactive shaping, and describe and analyze algorithmic implementations of the framework within multiple domains. This thesis also proposes and empirically examines algorithms for learning from both human reward and a pre-programmed reward function within an MDP, demonstrating two techniques that consistently outperform learning from either feedback signal alone. Subsequently, the thesis shifts its focus from the agent to the trainer, describing two psychological studies in which the trainer is manipulated by either changing their perceived role or by having the agent intentionally misbehave at specific times; we examine the effect of these manipulations on trainer behavior and the agent’s learned task performance. Lastly, I return to the problem of interactive shaping, for which we examine a space of mappings from human reward to objective functions, where mappings differ by how much the agent discounts reward it expects to receive in the future. Through this investigation, a deep relationship is identified between discounting, the level of positivity in human reward, and training success. Specific constraints of human reward are identified (i.e., the “positive circuits” problem), as are strategies for overcoming these constraints, pointing towards interactive shaping methods that are more effective than the already successful tamer framework.Item Segbot : a multipurpose robotic platform for multi-floor navigation(2014-12) Unwala, Ali Ishaq; Stone, Peter, 1971-The goal of this work is to describe a robotics platform called the Building Wide Intelligence Segbot (segbot). The segbot is a two wheeled robot that can robustly navigate our building, perform obstacle avoidance, and reason about the world. This work has two main goals. First we introduce the segbot platform to anyone that may use it in the future. We begin by examining off-the-shelf components we used and how to build a robot that is able to navigate in a complex multi-floor building environment with moving obstacles. Then we explain the software from a top down viewpoint, with a three layer abstraction model for segmenting code on any robotics platform. The second part of this document describes current work on the segbot platform, which is able to non-robustly take requests for coffee and navigate to a coffee shop while having to move across multiple floors in a building. My contribution to this work is building an infrastructure for multi-floor navigation. The multi-floor infrastructure built is non-robust but has helped identify several issues that will need to be tackled in future iterations of the segbot.Item Structured exploration for reinforcement learning(2010-12) Jong, Nicholas K.; Stone, Peter, 1971-; Kuipers, Benjamin; Miikkulainen, Risto; Mooney, Raymond; Singh, SatinderReinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world. Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment. Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems. In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of real-world domains. This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature. The first of these developments is model-based exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment. A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment. The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment. Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case. In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning. This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data.Item Texplore : temporal difference reinforcement learning for robots and time-constrained domains(2012-12) Hester, Todd; Stone, Peter, 1971-; Mooney, Raymond; Miikkulainen, Risto; Ballard, Dana; Littman, MichaelRobots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This dissertation identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuous state features; 3) it must handle sensor and/or actuator delays; and 4) it should continually select actions in real time. This dissertation focuses on addressing all four of these challenges. In particular, this dissertation is focused on time-constrained domains where the first challenge is critically important. In these domains, the agent's lifetime is not long enough for it to explore the domain thoroughly, and it must learn in very few samples. Although existing RL algorithms successfully address one or more of the RL for Robotics Challenges, no prior algorithm addresses all four of them. To fill this gap, this dissertation introduces TEXPLORE, the first algorithm to address all four challenges. TEXPLORE is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. Each tree in the random forest model represents a hypothesis of the domain's true dynamics, and the agent uses these hypotheses to explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real time whenever necessary. We empirically evaluate each component of TEXPLORE in comparison with other state-of-the-art approaches. In addition, we present modifications of TEXPLORE's exploration mechanism for different types of domains. The key result of this dissertation is a demonstration of TEXPLORE learning to control the velocity of an autonomous vehicle on-line, in real time, while running on-board the robot. After controlling the vehicle for only two minutes, TEXPLORE is able to learn to move the pedals of the vehicle to drive at the desired velocities. The work presented in this dissertation represents an important step towards applying RL to robotics and enabling robots to perform more tasks in society. By enabling robots to learn in few actions while acting on-line in real time on robots with continuous state and actuator delays, TEXPLORE significantly broadens the applicability of RL to robots.Item The development of bias in perceptual and financial decision-making(2014-08) Chen, Mei-Yen, Ph. D.; Poldrack, Russell A.; Maddox, W. Todd; Huk, Alexander C.; Pillow, Jonathan; Dhillon, Inderjit S.Decisions are prone to bias. This can be seen in daily choices. For instance, when the markets are plunging, investors tend to sell stocks instead of purchasing them with lower prices because people in general are more sensitive to the potential losses than the potential gains, or loss averse, in making financial choices. This also can be seen in laboratory tests. When participants receive higher payoffs for successfully discriminating a visual stimulus as one choice against the other, they begin choosing this higher-rewarded option more often even though the objective evidence indicates the alternative. In my dissertation, I used mathematical models and functional magnetic resonance imaging (fMRI) to track the development of bias in perceptual and financial decision-making and presented evidence characterizing the experience-sensitive and domain-general decision-making process in the human brains. The first chapter showed that bias could be developed through associating decision contexts and reward feedback from trial to trial in perceptual decision-making. Although the surface task differed, this learning process involved the same prediction error driven mechanisms implemented in the dopaminergic system as in financial decision-making. Furthermore, the frontal cortex increased its strength of connection between visual and value systems that accounted for the growth of perceptual bias. The second chapter extended this feedback-driven acquisition process to examine the influences of experience on loss aversion in financial decision-making. The results showed that people learned to make riskier or more conservative decisions according to the feedback that they had received in different decision contexts. This alternation in loss aversion was achieved through modulation of the value system’s sensitivity toward the potential gains in evaluation. The frontal cortex mediated this change. The third chapter used a mathematical model to identify the changes in financial decision-making that occurred faster than the temporal resolution of fMRI. The results suggested that people might simplify financial information into some rules of thumb for making a choice. These findings not only integrated the knowledge in different domains of decision neuroscience but also shed lights onto how one may refine the decision-making process against experiences.