Browsing by Subject "Machine learning"
Now showing 1 - 20 of 59
Results Per Page
Sort Options
Item Active visual category learning(2011-05) Vijayanarasimhan, Sudheendra; Grauman, Kristen Lorraine, 1979-; Dhillon, Inderjit S.; Aggarwal, J K.; Mooney, Raymond J.; Torralba, AntonioVisual recognition research develops algorithms and representations to autonomously recognize visual entities such as objects, actions, and attributes. The traditional protocol involves manually collecting training image examples, annotating them in specific ways, and then learning models to explain the annotated examples. However, this is a rather limited way to transfer human knowledge to visual recognition systems, particularly considering the immense number of visual concepts that are to be learned. I propose new forms of active learning that facilitate large-scale transfer of human knowledge to visual recognition systems in a cost-effective way. The approach is cost-effective in the sense that the division of labor between the machine learner and the human annotators respects any cues regarding which annotations would be easy (or hard) for either party to provide. The approach is large-scale in that it can deal with a large number of annotation types, multiple human annotators, and huge pools of unlabeled data. In particular, I consider three important aspects of the problem: (1) cost-sensitive multi-level active learning, where the expected informativeness of any candidate image annotation is weighed against the predicted cost of obtaining it in order to choose the best annotation at every iteration. (2) budgeted batch active learning, a novel active learning setting that perfectly suits automatic learning from crowd-sourcing services where there are multiple annotators and each annotation task may vary in difficulty. (3) sub-linear time active learning, where one needs to retrieve those points that are most informative to a classifier in time that is sub-linear in the number of unlabeled examples, i.e., without having to exhaustively scan the entire collection. Using the proposed solutions for each aspect, I then demonstrate a complete end-to-end active learning system for scalable, autonomous, online learning of object detectors. The approach provides state-of-the-art recognition and detection results, while using minimal total manual effort. Overall, my work enables recognition systems that continuously improve their knowledge of the world by learning to ask the right questions of human supervisors.Item Adaptive trading agent strategies using market experience(2011-05) Pardoe, David Merrill; Stone, Peter, 1971-; Miikkulainen, Risto; Mooney, Raymond; Saar-Tsechansky, Maytal; Wellman, MichaelAlong with the growth of electronic commerce has come an interest in developing autonomous trading agents. Often, such agents must interact directly with other market participants, and so the behavior of these participants must be taken into account when designing agent strategies. One common approach is to build a model of the market, but this approach requires the use of historical market data, which may not always be available. This dissertation addresses such a case: that of an agent entering a new market in which it has no previous experience. While the agent could adapt by learning about the behavior of other market participants, it would need to do so in an online fashion. The agent would not necessarily have to learn from scratch, however. If the agent had previous experience in similar markets, it could use this experience to tailor its learning approach to its particular situation. This dissertation explores methods that a trading agent could use to take advantage of previous market experience when adapting to a new market. Two distinct learning settings are considered. In the first, an agent acting as an auctioneer must adapt the parameters of an auction mechanism in response to bidder behavior, and a reinforcement learning approach is used. The second setting concerns agents that must adapt to the behavior of competitors in two scenarios from the Trading Agent Competition: supply chain management and ad auctions. Here, the agents use supervised learning to model the market. In both settings, methods of adaptation can be divided into four general categories: i) identifying the most similar previously encountered market, ii) learning from the current market only, iii) learning from the current market but using previous experience to tune the learning algorithm, and iv) learning from both the current and previous markets. The first contribution of this dissertation is the introduction and experimental validation of a number of novel algorithms for market adaptation fitting these categories. The second contribution is an exploration of the degree to which the quantity and nature of market experience impact the relative performance of methods from these categories.Item The application of machine learning methods in software verification and validation(2010-08) Phuc, Nguyen Vinh, 1955-; Ghosh, Joydeep; Khurshid, SarfrazMachine learning methods have been employed in data mining to discover useful, valid, and beneficial patterns for various applications of which, the domain encompasses areas in business, medicine, agriculture, census, and software engineering. Focusing on software engineering, this report presents an investigation of machine learning techniques that have been utilized to predict programming faults during the verification and validation of software. Artifacts such as traces in program executions, information about test case coverage and data pertaining to execution failures are of special interest to address the following concerns: Completeness for test suite coverage; Automation of test oracles to reduce human intervention in Software testing; Detection of faults causing program failures; Defect prediction in software. A survey of literature pertaining to the verification and validation of software also revealed a novel concept designed to improve black-box testing using Category-Partition for test specifications and test suites. The report includes two experiments using data extracted from source code available from the website (15) to demonstrate the application of a decision tree (C4.5) and the multilayer perceptron for fault prediction, and an example that shows a potential candidate for the Category-Partition scheme. The results from several research projects shows that the application of machine learning in software testing has achieved various degrees of success in effectively assisting software developers to improve their test strategy in verification and validation of software systems.Item Artificial intelligence in computer security: Detection, temporary repair and defense(2012-05) Randrianasolo, Arisoa; Pyeatt, Larry D.; Mengel, Susan A.; Rushton, J. Nelson; Lim, SunhoComputer security system providers are unable to provide timely security updates. Most security systems are not designed to be adaptive to the increasing number of new threats. Companies lose considerable amount of time and resources when security attacks manifest themselves. As an answer to these problems, this research is aimed at developing security systems capable of learning and updating themselves. The goal is to create security systems that will autonomously mature with exposure to threats over time. To achieve this goal, this research is exploring learning techniques from the Artificial Intelligence field. This research is proposing artificial intelligence based security systems with learning capability to perform intrusion detection, temporary repair and diagnostics, and defending a network. For network intrusion detection, this research is proposing the utilization of an Artificial Immune System based on Holland's Classifier. A Q-learning approach is proposed to provide a self learning temporary repair and diagnostic mechanism for attacked software. Finally, a General Game Player approach is used as a network defender designed to fight unknown attackers. These approaches are trained and tested with simulations employing DARPA's dataset. Despite the need for an initial training time and the massive use of memory, these approaches appear to have the ability to learn and are in close competition with the other approaches that were tested on the same dataset.Item Automated domain analysis and transfer learning in general game playing(2010-08) Kuhlmann, Gregory John; Stone, Peter, 1971-; Lifschitz, Vladimir; Mooney, Raymond J.; Porter, Bruce W.; Schaeffer, JonathanCreating programs that can play games such as chess, checkers, and backgammon, at a high level has long been a challenge and benchmark for AI. Computer game playing is arguably one of AI's biggest success stories. Several game playing systems developed in the past, such as Deep Blue, Chinook and TD-Gammon have demonstrated competitive play against the top human players. However, such systems are limited in that they play only one particular game and they typically must be supplied with game-specific knowledge. While their performance is impressive, it is difficult to determine if their success is due to generally applicable techniques or due to the human game analysis. A general game player is an agent capable of taking as input a description of a game's rules and proceeding to play without any subsequent human input. In doing so, the agent, rather than the human designer, is responsible for the domain analysis. Developing such a system requires the integration of several AI components, including theorem proving, feature discovery, heuristic search, and machine learning. In the general game playing scenario, the player agent is supplied with a game's rules in a formal language, prior to match play. This thesis contributes a collection of general methods for analyzing these game descriptions to improve performance. Prior work on automated domain analysis has focused on generating heuristic evaluation functions for use in search. The thesis builds upon this work by introducing a novel feature generation method. Also, I introduce a method for generating and comparing simple evaluation functions based on these features. I describe how more sophisticated evaluation functions can be generated through learning. Finally, this thesis demonstrates the utility of domain analysis in facilitating knowledge transfer between games for improved learning speed. The contributions are fully implemented with empirical results in the general game playing system.Item Autonomous qualitative learning of distinctions and actions in a developing agent(2010-08) Mugan, Jonathan William; Kuipers, Benjamin; Stone, Peter, 1971-; Ballard, Dana; Cohen, Leslie; Mooney, RaymondHow can an agent bootstrap up from a pixel-level representation to autonomously learn high-level states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.Item Autonomous sensor and action model learning for mobile robots(2008-08) Stronger, Daniel Adam; Stone, Peter, 1971-Autonomous mobile robots have the potential to be extremely beneficial to society due to their ability to perform tasks that are difficult or dangerous for humans. These robots will necessarily interact with their environment through the two fundamental processes of acting and sensing. Robots learn about the state of the world around them through their sensations, and they influence that state through their actions. However, in order to interact with their environment effectively, these robots must have accurate models of their sensors and actions: knowledge of what their sensations say about the state of the world and how their actions affect that state. A mobile robot’s action and sensor models are typically tuned manually, a brittle and laborious process. The robot’s actions and sensors may change either over time from wear or because of a novel environment’s terrain or lighting. It is therefore valuable for the robot to be able to autonomously learn these models. This dissertation presents a methodology that enables mobile robots to learn their action and sensor models starting without an accurate estimate of either model. This methodology is instantiated in three robotic scenarios. First, an algorithm is presented that enables an autonomous agent to learn its action and sensor models in a class of one-dimensional settings. Experimental tests are performed on a four-legged robot, the Sony Aibo ERS-7, walking forward and backward at different speeds while facing a fixed landmark. Second, a probabilistically motivated model learning algorithm is presented that operates on the same robot walking in two dimensions with arbitrary combinations of forward, sideways, and turning velocities. Finally, an algorithm is presented to learn the action and sensor models of a very different mobile robot, an autonomous car.Item Autonomous trading in modern electricity markets(2015-12) Urieli, Daniel; Stone, Peter, 1971-; Mooney, Raymond; Ravikumar, Pradeep; Baldick, Ross; Kolter, ZicoThe smart grid is an electricity grid augmented with digital technologies that automate the management of electricity delivery. The smart grid is envisioned to be a main enabler of sustainable, clean, efficient, reliable, and secure energy supply. One of the milestones in the smart grid vision will be programs for customers to participate in electricity markets through demand-side management and distributed generation; electricity markets will (directly or indirectly) incentivize customers to adapt their demand to supply conditions, which in turn will help to utilize intermittent energy resources such as from solar and wind, and to reduce peak-demand. Since wholesale electricity markets are not designed for individual participation, retail brokers could represent customer populations in the wholesale market, and make profit while contributing to the electricity grid’s stability and reducing customer costs. A retail broker will need to operate continually and make real-time decisions in a complex, dynamic environment. Therefore, it will benefit from employing an autonomous broker agent. With this motivation in mind, this dissertation makes five main contributions to the areas of artificial intelligence, smart grids, and electricity markets. First, this dissertation formalizes the problem of autonomous trading by a retail broker in modern electricity markets. Since the trading problem is intractable to solve exactly, this formalization provides a guideline for approximate solutions. Second, this dissertation introduces a general algorithm for autonomous trading in modern electricity markets, named LATTE (Lookahead-policy for Autonomous Time-constrained Trading of Electricity). LATTE is a general framework that can be instantiated in different ways that tailor it to specific setups. Third, this dissertation contributes fully implemented and operational autonomous broker agents, each using a different instantiation of LATTE. These agents were successful in international competitions and controlled experiments and can serve as benchmarks for future research in this domain. Detailed descriptions of the agents’ behaviors as well as their source code are included in this dissertation. Fourth, this dissertation contributes extensive empirical analysis which validates the effectiveness of LATTE in different competition levels under a variety of environmental conditions, shedding light on the main reasons for its success by examining the importance of its constituent components. Fifth, this dissertation examines the impact of Time-Of-Use (TOU) tariffs in competitive electricity markets through empirical analysis. Time-Of-Use tariffs are proposed for demand-side management both in the literature and in the real-world. The success of the different instantiations of LATTE demonstrates its generality in the context of electricity markets. Ultimately, this dissertation demonstrates that an autonomous broker can act effectively in modern electricity markets by executing an efficient lookahead policy that optimizes its predicted utility, and by doing so the broker can benefit itself, its customers, and the economy.Item Catweetegories : machine learning to organize your Twitter stream(2013-12) Simoes, Christopher Francis; Aziz, AdnanWe want to create a web service that will help users better organize the flood of tweets they receive every day by using machine learning. This was done by experimenting with ways to manually classify training sets of tweets such as using Amazon’s Mechanical Turk and crawling the Internet for large quantities of tweets. Once we acquired good training data, we began building a classifier. We tried NLTK and Stanford NLP as libraries for creating a classifier, and we ultimately created a classifier that is 87.5% accurate. We then built a web service to expose this classifier and to allow any user on the Internet to organize their tweets. We built our web service by using many open source tools, and we discuss how we integrated these tools to create a production quality web service. We run our web service in the Amazon cloud, and we review the costs associated with running in Amazon. Finally we review the lessons we learned and share our thoughts on further work we would like to do in the future.Item Creating diverse ensemble classifiers to reduce supervision(2005) Melville, Prem Noel; Mooney, Raymond J. (Raymond Joseph)Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed trainvii ing set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE’s resilience to imperfections in data, in the form of missing features, classification noise, and feature noise. DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, ACTIVEDECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that ACTIVEDECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on ACTIVEDECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach. Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model’s accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.Item Crunch the market : a Big Data approach to trading system optimization(2013-12) Mauldin, Timothy Allan; Aziz, AdnanDue to the size of data needed, running software to analyze and tuning intraday trading strategies can take large amounts of time away from analysts, who would like to be able to evaluate strategies and optimize strategy parameters very quickly, ideally in the blink of an eye. Fortunately, Big Data technologies are evolving rapidly and can be leveraged for these purposes. These technologies include software systems for distributed computing, parallel hardware, and on demand computing resources in the cloud. This report presents a distributed software system for trading strategy analysis. It also demonstrates the effectiveness of Machine Learning techniques in decreasing parameter optimization workload. The results from tests run on two different commercial cloud service providers show linear scalability when analyzing intraday trading strategies.Item Data driven analysis of fast oxide ion diffusion in solid oxide fuel cell cathodes(2015-08) Miller, Alexander Scot; Benedek, Nicole; Yu, GuihuaThe goal of this study was to determine whether atomic-scale features (related to composition and crystal structure) of perovskite and perovskite-related materials could be used to predict fast oxide ion diffusion for Solid Oxide Fuel Cell (SOFC) applications; materials that can be used as SOFC cathodes were a particular focus. One hundred and twenty six pairs of diffusion (D*) and surface exchange (k*) coefficients for a variety of materials were collected from literature sources published between 1991 and 2015. A website was created with these data for public viewing. Statistical tests revealed that diffusion measurements have significant differences at 400K, 700K, and 1000K when grouped according to material family and sample type. Models predicting diffusion rates were created from atomic-scale features at several temperatures between 400K and 1000K. Perovskite and double-perovskite models explained >85% of the variance in ln(D*k*) at 800K-1000K, meaning these models successfully predicted ln(D*k*) more than 85% of the time. These models explained 55%-75% of the variance at lower temperatures (400K-700K). Materials whose B-site cations had the highest electron affinities showed the fastest diffusion at all temperatures. Thus, these models suggest using B-site cations with high electron affinities (i.e. atoms that are easily reduced) may increase fuel cell performance, even at low and intermediate temperatures.Item Data mining techniques for classifying RNA folding structures(2016-08) Kim, Vince; Garg, Vijay K. (Vijay Kumar), 1963-; Gutell, Robin RRNA is a crucial biological molecule that is critical for protein synthesis. Significant research has been done on folding algorithms for RNA, in particular the 16S rRNA of bacteria and archaea. Rather than modifying current works on these folding algorithms, this report ventures into the pioneering works for data mining the same 16S rRNA. Initial works were based on a single complex helix across seven organisms. However, classification analysis proved to be inaccurate due to severe multicollinearity in the data set. A secondary data mining analysis was done on the entire RNA sequence of the same seven organisms, and was successfully used to sequentially categorically predict the characteristic of a given nucleotide in the RNA sequence.Item Detection and Diagnosis of Out-of-Specification Failures in Mixed-Signal Circuits(2014-12-03) Mukherjee, ParijatVerifying whether a circuit meets its intended specifications, as well as diagnosing the circuits that do not, is indispensable at every stage of integrated circuit design. Otherwise, a significant portion of fabricated circuits could fail or behave correctly only under certain conditions. Shrinking process technologies and increased integration has further complicated this task. This is especially true of mixed-signal circuits, where a slight parametric shift in an analog component can change the output significantly. We are thus rapidly approaching a proverbial wall, where migrating existing circuits to advanced technology nodes and/or designing the next generation circuits may not be possible without suitable verification and debug strategies. Traditional approaches target accuracy and not scalability, limiting their use to high-dimensional systems. Relaxing the accuracy requirement mitigates the computational cost. Simultaneously, quantifying the level of inaccuracy retains the effectiveness of these metrics. We exercise this accuracy vs. turn-around-time trade-off to deal with multiple mixed-signal problems across both the pre- and post-silicon domains. We first obtain approximate failure probability estimates along with their confidence bands using limited simulation budgets. We then generate ?failure regions? that naturally explain the parametric interactions resulting in predicted failures. These two pre-silicon contributions together enable us to estimate and reduce the failure probability, which we demonstrate on a high-dimensional phase-locked loop test-case. We leverage this pre-silicon knowledge towards test-set selection and post-silicon debug to alleviate the limited controllability and observability in the post-silicon domain. We select a set of test-points that maximizes the probability of observing failures. We then use post-silicon measurements at these test-points to identify systematic deviations from pre-silicon belief. This is demonstrated using the phase-locked loop test-case, where we boost the number of failures to observable levels and use the obtained measurements to root-cause underlying parametric shifts. The pre-silicon contributions can also be extended to perform equivalence checking and to help diagnose detected model-mismatches. The resultant calibrated model allows us to apply our work to the system level as well. The equivalence checking and model-mismatch diagnosis is successfully demonstrated using a high-level abstraction model for the phase-locked loop test-case.Item Developing a real-time freeway incident detection model using machine learning techniques(2016-05) Motamed, Moggan; Machemehl, Randy B.; Walton, Michael; Zhang, Zhanmin; Boyles, Stephen; Caldas, Carlos; Barnes, WesleyReal-time incident detection on freeways plays an important part in any modern traffic management operation by maximizing road system performance. The US Department of Transportation (US-DOT) estimates that over half of all congestion events are caused by highway incidents rather than by rush-hour traffic in big cities. An effective incident detection and management operation cannot prevent incidents, however, it can diminish the impacts of non-recurring congestion problems. The main purpose of real-time incident detection is to reduce delay and the number of secondary accidents, and to improve safety and travel information during unusual traffic conditions. The majority of automatic incident detection algorithms are focused on identifying traffic incident patterns but do not adequately investigate possible similarities in patterns observed under incident-free conditions. When traffic demand exceeds road capacity, density exceeds critical values and traffic speed decreases, the traffic flow process enters a highly unstable regime, often referred to as “stop-and-go” conditions. The most challenging part of real-time incident detection is the recognition of traffic pattern changes when incidents happen during stop-and-go conditions. Recently, short-term freeway congestion detection algorithms have been proposed as solutions to real-time incident detection, using procedures known as dynamic time warping (DTW) and the support vector machine (SVM). Some studies have shown these procedures to produce higher detection rates than Artificial Intelligence (AI) algorithms with lower false alarm rates. These proposed methods combine data mining and time series classification techniques. Such methods comprise interdisciplinary efforts, with the confluence of a set of disciplines, including statistics, machine learning, Artificial Intelligence, and information science. A literature review of the methodology and application of these two models will be presented in the following chapters. SVM, Naïve Bayes (NB), and Random Forest classifier models incorporating temporal data and an ensemble technique, when compared with the original SVM model, achieve improved detection rates by optimizing the parameter thresholds. The main purpose of this dissertation is to examine the most robust algorithms (DTW, SVM, Naïve Bayes, Decision Tree, SVM Ensemble) and to develop a generalized automatic incident detection algorithm characterized by high detection rates and low false alarm rates during peak hours. In this dissertation, the transferability of the developed incident detection model was tested using the Dallas and Miami field datasets. Even though the primary service of urban traffic control centers includes detecting incidents and facilitating incident clearance, estimating freeway incident durations remains a significant incident management challenge for traffic operations centers. As a next step this study examines the effect of V/C (volume/capacity) ratio, level of service (LOS), weather condition, detection mode, number of involved lanes, and incident type on the time duration of traffic incidents. Results of this effort can benefit traffic control centers improving the accuracy of estimated incident duration, thereby improving the authenticity of traveler guidance information.Item Dirty statistical models(2012-05) Jalali, Ali, 1982-; Sanghavi, Sujay Rajendra, 1979-; Caramanis, Constantine; Ghosh, Joydeep; Dhillon, Inderjit; Ravikumar, PradeepIn fields across science and engineering, we are increasingly faced with problems where the number of variables or features we need to estimate is much larger than the number of observations. Under such high-dimensional scaling, for any hope of statistically consistent estimation, it becomes vital to leverage any potential structure in the problem such as sparsity, low-rank structure or block sparsity. However, data may deviate significantly from any one such statistical model. The motivation of this thesis is: can we simultaneously leverage more than one such statistical structural model, to obtain consistency in a larger number of problems, and with fewer samples, than can be obtained by single models? Our approach involves combining via simple linear superposition, a technique we term dirty models. The idea is very simple: while any one structure might not capture the data, a superposition of structural classes might. Dirty models thus searches for a parameter that can be decomposed into a number of simpler structures such as (a) sparse plus block-sparse, (b) sparse plus low-rank and (c) low-rank plus block-sparse. In this thesis, we propose dirty model based algorithms for different problems such as multi-task learning, graph clustering and time-series analysis with latent factors. We analyze these algorithms in terms of the number of observations we need to estimate the variables. These algorithms are based on convex optimization and sometimes they are relatively slow. We provide a class of low-complexity greedy algorithms that not only can solve these optimizations faster, but also guarantee the solution. Other than theoretical results, in each case, we provide experimental results to illustrate the power of dirty models.Item Discriminative object categorization with external semantic knowledge(2013-08) Hwang, Sung Ju; Grauman, Kristen Lorraine, 1979-Visual object category recognition is one of the most challenging problems in computer vision. Even assuming that we can obtain a near-perfect instance level representation with the advances in visual input devices and low-level vision techniques, object categorization still remains as a difficult problem because it requires drawing boundaries between instances in a continuous world, where the boundaries are solely defined by human conceptualization. Object categorization is essentially a perceptual process that takes place in a human-defined semantic space. In this semantic space, the categories reside not in isolation, but in relation to others. Some categories are similar, grouped, or co-occur, and some are not. However, despite this semantic nature of object categorization, most of the today's automatic visual category recognition systems rely only on the category labels for training discriminative recognition with statistical machine learning techniques. In many cases, this could result in the recognition model being misled into learning incorrect associations between visual features and the semantic labels, from essentially overfitting to training set biases. This limits the model's prediction power when new test instances are given. Using semantic knowledge has great potential to benefit object category recognition. First, semantic knowledge could guide the training model to learn a correct association between visual features and the categories. Second, semantics provide much richer information beyond the membership information given by the labels, in the form of inter-category and category-attribute distances, relations, and structures. Finally, the semantic knowledge scales well as the relations between categories become larger with an increasing number of categories. My goal in this thesis is to learn discriminative models for categorization that leverage semantic knowledge for object recognition, with a special focus on the semantic relationships among different categories and concepts. To this end, I explore three semantic sources, namely attributes, taxonomies, and analogies, and I show how to incorporate them into the original discriminative model as a form of structural regularization. In particular, for each form of semantic knowledge I present a feature learning approach that defines a semantic embedding to support the object categorization task. The regularization penalizes the models that deviate from the known structures according to the semantic knowledge provided. The first semantic source I explore is attributes, which are human-describable semantic characteristics of an instance. While the existing work treated them as mid-level features which did not introduce new information, I focus on their potential as a means to better guide the learning of object categories, by enforcing the object category classifiers to share features with attribute classifiers, in a multitask feature learning framework. This approach essentially discovers the common low-dimensional features that support predictions in both semantic spaces. Then, I move on to the semantic taxonomy, which is another valuable source of semantic knowledge. The merging and splitting criteria for the categories on a taxonomy are human-defined, and I aim to exploit this implicit semantic knowledge. Specifically, I propose a tree of metrics (ToM) that learns metrics that capture granularity-specific similarities at different nodes of a given semantic taxonomy, and uses a regularizer to isolate granularity-specific disjoint features. This approach captures the intuition that the features used for the discrimination of the parent class should be different from the features used for the children classes. Such learned metrics can be used for hierarchical classification. The use of a single taxonomy can be limited in that its structure is not optimal for hierarchical classification, and there may exist no single optimal semantic taxonomy that perfectly aligns with visual distributions. Thus, I next propose a way to overcome this limitation by leveraging multiple taxonomies as semantic sources to exploit, and combine the acquired complementary information across multiple semantic views and granularities. This allows us, for example, to synthesize semantics from both 'Biological', and 'Appearance'-based taxonomies when learning the visual features. Finally, as a further exploration of more complex semantic relations different from the previous two pairwise similarity-based models, I exploit analogies, which encode the relational similarities between two related pairs of categories. Specifically, I use analogies to regularize a discriminatively learned semantic embedding space for categorization, such that the displacements between the two category embeddings in both category pairs of the analogy are enforced to be the same. Such a constraint allows for a more confusing pair of categories to benefit from a clear separation in the matched pair of categories that share the same relation. All of these methods are evaluated on challenging public datasets, and are shown to effectively improve the recognition accuracy over purely discriminative models, while also guiding the recognition to be more semantic to human perception. Further, the applications of the proposed methods are not limited to visual object categorization in computer vision, but they can be applied to any classification problems where there exists some domain knowledge about the relationships or structures between the classes. Possible applications of my methods outside the visual recognition domain include document classification in natural language processing, and gene-based animal or protein classification in computational biology.Item Exploiting structure in large-scale optimization for machine learning(2015-08) Hsieh, Cho-Jui; Dhillon, Inderjit S.; Ravikumar, Pradeep; Pingali, Keshav; Tewari, Ambuj; Wright, Stephen JWith an immense growth of data, there is a great need for solving large-scale machine learning problems. Classical optimization algorithms usually cannot scale up due to huge amount of data and/or model parameters. In this thesis, we will show that the scalability issues can often be resolved by exploiting three types of structure in machine learning problems: problem structure, model structure, and data distribution. This central idea can be applied to many machine learning problems. In this thesis, we will describe in detail how to exploit structure for kernel classification and regression, matrix factorization for recommender systems, and structure learning for graphical models. We further provide comprehensive theoretical analysis for the proposed algorithms to show both local and global convergent rate for a family of in-exact first-order and second-order optimization methods.Item Greedy structure learning of Markov Random Fields(2011-08) Johnson, Christopher Carroll; Ravikumar, Pradeep; Dhillon, InderjitProbabilistic graphical models are used in a variety of domains to capture and represent general dependencies in joint probability distributions. In this document we examine the problem of learning the structure of an undirected graphical model, also called a Markov Random Field (MRF), given a set of independent and identically distributed (i.i.d.) samples. Specifically, we introduce an adaptive forward-backward greedy algorithm for learning the structure of a discrete, pairwise MRF given a high dimensional set of i.i.d. samples. The algorithm works by greedily estimating the neighborhood of each node independently through a series of forward and backward steps. By imposing a restricted strong convexity condition on the structure of the learned graph we show that the structure can be fully learned with high probability given $n=\Omega(d\log (p))$ samples where $d$ is the dimension of the graph and $p$ is the number of nodes. This is a significant improvement over existing convex-optimization based algorithms that require a sample complexity of $n=\Omega(d^2\log(p))$ and a stronger irrepresentability condition. We further support these claims with an empirical comparison of the greedy algorithm to node-wise $\ell_1$-regularized logistic regression as well as provide a real data analysis of the greedy algorithm using the Audioscrobbler music listener dataset. The results of this document provide an additional representation of work submitted by A. Jalali, C. Johnson, and P. Ravikumar to NIPS 2011.Item Improving process monitoring and modeling of batch-type plasma etching tools(2015-05) Lu, Bo, active 21st century; Edgar, Thomas F.; Stuber, John D; Djurdjanovic, Dragan; Ekerdt, John G; Bonnecaze, Roger T; Baldea, MichaelManufacturing equipments in semiconductor factories (fabs) provide abundant data and opportunities for data-driven process monitoring and modeling. In particular, virtual metrology (VM) is an active area of research. Traditional monitoring techniques using univariate statistical process control charts do not provide immediate feedback to quality excursions, hindering the implementation of fab-wide advanced process control initiatives. VM models or inferential sensors aim to bridge this gap by predicting of quality measurements instantaneously using tool fault detection and classification (FDC) sensor measurements. The existing research in the field of inferential sensor and VM has focused on comparing regressions algorithms to demonstrate their feasibility in various applications. However, two important areas, data pretreatment and post-deployment model maintenance, are usually neglected in these discussions. Since it is well known that the industrial data collected is of poor quality, and that the semiconductor processes undergo drifts and periodic disturbances, these two issues are the roadblocks in furthering the adoption of inferential sensors and VM models. In data pretreatment, batch data collected from FDC systems usually contain inconsistent trajectories of various durations. Most analysis techniques requires the data from all batches to be of same duration with similar trajectory patterns. These inconsistencies, if unresolved, will propagate into the developed model and cause challenges in interpreting the modeling results and degrade model performance. To address this issue, a Constrained selective Derivative Dynamic Time Warping (CsDTW) method was developed to perform automatic alignment of trajectories. CsDTW is designed to preserve the key features that characterizes each batch and can be solved efficiently in polynomial time. Variable selection after trajectory alignment is another topic that requires improvement. To this end, the proposed Moving Window Variable Importance in Projection (MW-VIP) method yields a more robust set of variables with demonstrably more long-term correlation with the predicted output. In model maintenance, model adaptation has been the standard solution for dealing with drifting processes. However, most case studies have already preprocessed the model update data offline. This is an implicit assumption that the adaptation data is free of faults and outliers, which is often not true for practical implementations. To this end, a moving window scheme using Total Projection to Latent Structure (T-PLS) decomposition screens incoming updates to separate the harmless process noise from the outliers that negatively affects the model. The integrated approach was demonstrated to be more robust. In addition, model adaptation is very inefficient when there are multiplicities in the process, multiplicities could occur due to process nonlinearity, switches in product grade, or different operating conditions. A growing structure multiple model system using local PLS and PCA models have been proposed to improve model performance around process conditions with multiplicity. The use of local PLS and PCA models allows the method to handle a much larger set of inputs and overcome several challenges in mixture model systems. In addition, fault detection sensitivities are also improved by using the multivariate monitoring statistics of these local PLS/PCA models. These proposed methods are tested on two plasma etch data sets provided by Texas Instruments. In addition, a proof of concept using virtual metrology in a controller performance assessment application was also tested.
- «
- 1 (current)
- 2
- 3
- »