Browsing by Subject "algorithms"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Algorithms for Searching and Analyzing Sets of Evolutionary Trees(2014-04-18) Brammer, GrantThe evolutionary relationships between organisms are represented as phylogenetic trees. These trees have important implications for understanding biodiversity, tracking disease, and designing medicine. Since the evolutionary process that led to modern biodiversity was not directly recorded, phylogenetic trees are inferred from modern observations. Inferring accurate phylogenies is computationally difficult and many inference algorithms produce multiple phylogenetic trees of equal quality. The common method for presenting a set of trees is to summarize their common features into a single consensus tree. Consensus methods make it easy to tell which features are common to a set of trees, but how do you explore the hypotheses that are not the majority of trees? This question is best answered by a search algorithm. We present algorithms to query a set of trees based on their internal structure. Trees can be queried based on their bipartitions, quartets, clades, subtrees, or taxa, and we present a new concept which unifies edge based relationships for search functions. To extend the power of our search functions we provide the ability to combine the results of multiple searches using set operations. We also explore the differences between sets of trees. Clustering algorithms can detect if there are multiple distinct hypotheses within a set of trees. Decision tree depth and distinguishing bipartitions can be used to measure the similarity between sets of trees. For situations where a set of trees is made up of multiple distinct sets, we present p-support which is a measure to quantify the impact of the individual sets on a single consensus tree. The algorithms are presented within the context of TreeHouse. This is my open source platform for querying and analyzing sets of trees. One goal of TreeHouse was to unite query and analysis algorithms under a single user interface. The seamless interaction between fast filtering and analysis algorithms allows users to the explore their data in a way not easily accomplished elsewhere. We believe that the algorithms in this document and in TreeHouse can shed new light on often unexplored territory.Item Community-Oriented Models and Applications for the Social Web(2012-07-16) Kashoob, Said Masoud AliThe past few years have seen the rapid rise of all things "social" on the web from the growth of online social networks like Facebook, to user-contributed content sites like Flickr and YouTube, to social bookmarking services like Delicious, among many others. Whereas traditional approaches to organizing and accessing the web?s massive amount of information have focused on content-based and link-based approaches, these social systems offer rich opportunities for user-based and community-based exploration and analysis of the web by building on the unprecedented access to the interests and perspectives of millions of users. We focus here on the challenge of modeling and mining social bookmarking systems, in which resources are enriched by large-scale socially generated metadata (?tags?) and contextualized by the user communities that are associated with the resources. Our hypothesis is that an underlying social collective intelligence is embedded in the uncoordinated actions of users on social bookmarking services, and that this social collective intelligence can be leveraged for enhanced web-based information discovery and knowledge sharing. Concretely, we posit the existence of underlying implicit communities in these social bookmarking systems that drive the social bookmarking process which can provide a foundation for community-based organization of web resources. To that end, we make three contributions: ? First, we propose a pair of novel probabilistic generative models for describing and modeling community-oriented social bookmarking. We show how these models enable effective extraction of meaningful communities over large real world social bookmarking services. ? Second, we develop two frameworks for community-based web information browsing and search that are based on these community-oriented social bookmarking models. We show how both achieve improved discovery and exploration of the social web. ? Third, we introduce a community evolution framework for studying and analyzing social bookmarking communities over time. We explore the temporal dimension of social bookmarking and explore the dynamics of community formation, evolution, and dissolution. By uncovering implicit communities, putting them to use in an application scenario (search and browsing), and analyzing them over time, this dissertation provides a foundation for the study of how social knowledge networks are self-organized, a deeper understanding and appreciation of the factors impacting collective intelligence, and the creation of new information access algorithms for leveraging these communities.Item Content-aware Caching and Traffic Management in Content Distribution Networks(2012-02-14) Amble, Meghana MukundThe rapid increase of content delivery over the Internet has lead to the proliferation of content distribution networks (CDNs). Management of CDNs requires algorithms for request routing, content placement, and eviction in such a way that user delays are small. Our objective in this work is to design feasible algorithms that solve this trio of problems. We abstract the system of front-end source nodes and back-end caches of the CDN in the likeness of the input and output nodes of a switch. In this model, queues of requests for different pieces of content build up at the source nodes, which route these requests to a cache that contains the content. For each request that is routed to a cache, a corresponding data file is transmitted back to the source across links of finite capacity. Caches are of finite size, and the content of the caches can be refreshed periodically. A requested but missing item is fetched to the cache from the media vault of the CDN. In case of a lack of adequate space at the cache, an existing, unrequested item may be evicted from the cache in order to accommodate a new item. Every such cache refresh or media vault access incurs a finite cost. Hence the refresh periodicity allowed to the system represents our system cost. In order to obtain small user delays, our algorithms must consider the lengths of the request queues that build up at the nodes. Stable policies ensure the finiteness of the request queues, while good polices also lead to short queue lengths. We first design a throughput-optimal algorithm that solves the routing-placement eviction problem using instantaneous system state information. The design yields insight into the impact of different cache refresh and eviction policies on queue length. We use this and construct throughput optimal algorithms that engender short queue lengths. We then propose a regime of algorithms which remedies the inherent problem of wastage of capacity. We also develop heuristic variants, and we study their performance. We illustrate the potential of our approach and validate all our claims and results through simulations on different CDN topologies.Item Efficient state space exploration for parallel test generation(2009-05) Ramasamy Kandasamy, Manimozhian; Khurshid, Sarfraz; Perry, Dewayne E.Automating the generation of test cases for software is an active area of research. Specification based test generation is an approach in which a formal representation of a method is analyzed to generate valid test cases. Constraint solving and state space exploration are important aspects of the specification based test generation. One problem with specification based testing is that the size of the state space explodes when we apply this approach to a code of practical size. Hence finding ways to reduce the number of candidates to explore within the state space is important to make this approach practical in industry. Korat is a tool which generates test cases for Java programs based on predicates that validate the inputs to the method. Various ongoing researches intend to increase the tools effectiveness in handling large state space. Parallelizing Korat and minimizing the exploration of invalid candidates are the active research directions. This report surveys the basic algorithms of Korat, PKorat, and Fast Korat. PKorat is a parallel version of Korat and aims to take advantage of multi-processor and multicore systems available. Fast Korat implements four optimizations which reduce the number of candidate explored to generate validate candidates and reduce the amount of time required to explore each candidate. This report also presents the execution time results for generating test candidates for binary tree, doubly linked list, and sorted singly linked list, from their respective predicates.Item Graph searching and a generalized parking function(2009-05-15) Kostic, Dimitrije NenadParking functions have been a focus of mathematical research since the mid-1970s. Various generalizations have been introduced since the mid-1990s and deep relationships between these and other areas of mathematics have been discovered. Here, we introduced a new generalization, the G-multiparking function, where G is a simple graph on a totally ordered vertex set {1, 2, . . . , n}. We give an algorithm that converts a G-multiparking function into a rooted spanning forest of G by using a graph searching technique to build a sequence F1, F2, . . . , Fn, where each Fi is a subforest of G and Fn is a spanning forest of G. We also give another algorithm that converts a rooted spanning forest of G to a G-multiparking function and prove that the resulting functions (between the sets of G-multiparking functions and rooted spanning forests of G) are bijections and inverses of each other. Each of these two algorithms relies on a choice function , which is a function from the set of pairs (F,W) (where F is a subforest of G and W is a set of some of the leaves of F) into W. There are many possible choice functions for a given graph, each giving formality to the concept of choosing how a graph searching algorithm should procede at each step (i.e. if the algorithm has reached Fi, then Fi+1 is some forest on the vertex set V (Fi)?{(Fi,W)} for some W). We also define F-redundant edges, which are edges whose removal from G does not affect the execution of either algorithm mentioned above. This concept leads to a categorization of the edges of G, which in turn provides a new formula for the Tutte polynomial of G and other enumerative results.Item Horizontal Well Placement Optimization in Gas Reservoirs Using Genetic Algorithms(2011-08-08) Gibbs, Trevor HowardHorizontal well placement determination within a reservoir is a significant and difficult step in the reservoir development process. Determining the optimal well location is a complex problem involving many factors including geological considerations, reservoir and fluid properties, economic costs, lateral direction, and technical ability. The most thorough approach to this problem is that of an exhaustive search, in which a simulation is run for every conceivable well position in the reservoir. Although thorough and accurate, this approach is typically not used in real world applications due to the time constraints from the excessive number of simulations. This project suggests the use of a genetic algorithm applied to the horizontal well placement problem in a gas reservoir to reduce the required number of simulations. This research aims to first determine if well placement optimization is even necessary in a gas reservoir, and if so, to determine the benefit of optimization. Performance of the genetic algorithm was analyzed through five different case scenarios, one involving a vertical well and four involving horizontal wells. The genetic algorithm approach is used to evaluate the effect of well placement in heterogeneous and anisotropic reservoirs on reservoir recovery. The wells are constrained by surface gas rate and bottom-hole pressure for each case. This project's main new contribution is its application of using genetic algorithms to study the effect of well placement optimization in gas reservoirs. Two fundamental questions have been answered in this research. First, does well placement in a gas reservoir affect the reservoir performance? If so, what is an efficient method to find the optimal well location based on reservoir performance? The research provides evidence that well placement optimization is an important criterion during the reservoir development phase of a horizontal-well project in gas reservoirs, but it is less significant to vertical wells in a homogeneous reservoir. It is also shown that genetic algorithms are an extremely efficient and robust tool to find the optimal location.Item Mathematical Foundations and Algorithms for Clique Relaxations in Networks(2012-02-14) Pattillo, JeffreyThis dissertation establishes mathematical foundations for the properties exhibited by generalizations of cliques, as well as algorithms to find such objects in a network. Cliques are a model of an ideal group with roots in social network analysis. They have since found applications as a part of grouping mechanisms in computer vision, coding theory, experimental design, genomics, economics, and telecommunications among other fields. Because only groups with ideal properties form a clique, they are often too restrictive for identifying groups in many real-world networks. This motivated the introduction of clique relaxations that preserve some of the various defining properties of cliques in relaxed form. There are six clique relaxations that are the focus of this dissertation: s-clique, s-club, s-plex, k-core, quasi-clique, and k-connected subgraphs. Since cliques have found applications in so many fields, research into these clique relaxations has the potential to steer the course of much future research. The focus of this dissertation is on bringing organization and rigorous methodology to the formation and application of clique relaxations. We provide the first taxonomy focused on how the various clique relaxations relate on key structural properties demonstrated by groups. We also give a framework for how clique relaxations can be formed. This equips researchers with the ability to choose the appropriate clique relaxation for an application based on its structural properties, or, if an appropriate clique relaxation does not exist, form a new one. In addition to identifying the structural properties of the various clique relaxations, we identify properties and prove propositions that are important computationally. These assist in creating algorithms to find a clique relaxation quickly as it is immersed in a network. We give the first ever analysis of the computational complexity of finding the maximum quasi-clique in a graph. Such analysis identifies for researchers the appropriate set of computational tools to solve the maximum quasiclique problem. We further create a polynomial time algorithm for identifying large 2-cliques within unit disk graphs, a special class of graphs often arising in communication networks. We prove the algorithm to have a guaranteed 1=2-approximation ratio and finish with computational results.Item Synthesizing Robust Networks for Engineering Applications with Resource Constraints(2014-04-29) Nagarajan, HarshaThis dissertation deals with the following simpler version of an open problem in system realization theory which has several important engineering applications: Given a collection of masses and a set of linear springs with a specified cost and stiffness, a resource constraint in terms of a budget on the total cost, the problem is to determine an optimal connection of masses and springs so that the resulting structure is as stiff as possible, i.e., the structure is connected and its smallest nonzero natural frequency is as large as possible. One often encounters variants of this problem in deploying Unmanned Aerial Vehicles (UAVs) for civilian and military applications. In such problems, one must determine the pairs of UAVs that must maintain a communication link so that constraints on resources and performance, such as a limit on the maximum number of communication links deployed, power consumed and maximum latency in routing information from one UAV to the other, are met and a performance objective is maximized. In this dissertation, algebraic connectivity, or its mechanical analog - the smallest non-zero natural frequency of a connected structure was chosen as a performance objective. Algebraic connectivity determines the convergence rate of consensus protocols and error attenuation in UAV formations and is chosen to be a performance objective as it can be viewed as a measure of robustness in UAV communication networks to random node failures. Underlying the mechanical and UAV network synthesis problems is a Mixed Integer Semi-Definite Program (MISDP), which was recently shown to be a NP-hard problem. There has not been any systematic procedure in the literature to solve this problem. This dissertation is aimed at addressing this void in the literature. The novel contributions of this dissertation to the literature are as follows: a) An iterative primal-dual algorithm and an algorithm based on the outer approximation of the semi-definite constraint utilizing a cutting plane technique were developed for computing optimal algebraic connectivity. These algorithms are based on a polyhedral approximation of the feasible set of MISDP, b) A bisection algorithm was developed to reduce the MISDP to a Binary Semi-Definite Program (BSDP) to achieve better computational efficiency, c) The feasible set of the MISDP was efficiently relaxed by replacing the positive semi-definite constraint with linear inequalities associated with a family of Fiedler vectors to compute a tighter upper bound for algebraic connectivity, d) Efficient neighborhood search heuristics based on greedy methods such as the k-opt and improved k-opt heuristics were developed, e) Variants of the problem occurring in UAV backbone networks and Air Transportation Management were considered in the dissertation along with numerical simulations corroborating the methodologies developed in this dissertation.Item Towards Real Time Optimal Auto-tuning of PID Controllers(2013-08-22) Hill, Aaron JamisonThe Proportional-Integral-Derivative (PID) controller has been widely used by the process control industry for many years. Design methods for PID Controllers are mature and have been heavily researched and evaluated. For most of its modern history the Ziegler-Nichols methods have been used for tuning PID controllers into desired operating conditions. Recently, automatic tuning methods have been formulated and used to generate stable PID controlled systems. These methods have also been implemented on real time systems. However, the use of optimal methods for auto tuning PID controllers on real time systems has not seen much discussion. In this thesis we explore the applicability of optimal PID design methods from Datta, Ho, and Bhattacharrya, to real time system control. The design method is based on a complete characterization of the set of stabilizing PID parameters for various plant models and a subsequent search over the stabilizing set for the optimal controller. A full implementation of the algorithms is completed on an embedded system with DSP hardware. These implementations are then tested against a large number of examples to determine both accuracy and applicability to real time systems. The major design constraint for application of these algorithms to real time systems is computation time. The faster the optimal result can be computed the more applicable the algorithm is to a real time environment. In order to bring each of these algorithms into a real time system, fast search algorithms were developed to quickly compute the optimal result for the given performance criterion. Three different search methods were developed, compared and analyzed. The first method is a brute force search used as a basis to compare the two additional fast search methods. The two faster search methods prove to be vastly superior in determining the optimal result with the same level of accuracy as brute force search, but in a greatly reduced time. These search methods achieve their superior speeds by reducing the search space without sacrificing accuracy of the results. With these two fast search methods applied to the complete characterization of stabilizing PID controllers, application to real time systems is achieved and demonstrated through examples of various performance criteria.