Browsing by Subject "Parallel programming"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Asynchronous automatic-signal monitors with multi-object synchronization(2016-12) Hung, Wei-Lun; Garg, Vijay K. (Vijay Kumar), 1963-; Julien, Christine; Khurshid, Sarfraz; Mittal, Neeraj; Perry, Dewayne E; Pingali, KeshavOne of the fundamental problems in parallel programming is that there is no simple programming paradigm that provides mutual exclusion and synchronization with efficient implementation at the same time. For monitor [Hoa74,Han75] (lock-based) systems, only experienced programmers can develop high-performance fine-grained lock-based implementations. Programmers frequently introduce bugs with traditional monitors. Researchers have proposed transactional memory [HM93,ST95], which provides a simple and elegant mechanism for programmers to atomically execute a set of memory operations so that there is no deadlock in transactional memory systems. However, most of transactional memory systems lack conditional synchronization support [WLS14,LW14]. Hence, writing multi-threaded programs with conditional synchronization is rather difficult. In this dissertation, we develop a parallel programming framework that provides simple constructs for mutual exclusion and synchronization as well as efficient implementation.Item Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation(2015-12) Schatz, Martin Daniel; Van de Geijn, Robert A.; Kolda, Tamara G.; Stanton, John F; Pingali, Keshav; Hammond, Jeff R; Batory, Don SA goal of computer science is to develop practical methods to automate tasks that are otherwise too complex or tedious to perform manually. Complex tasks can include determining a practical algorithm and creating the associated implementation for a given problem specification. Goal-oriented programming can make this systematic. Therefore, we can rely on automated tools to create implementations by expressing the task of creating implementations in terms of goal-oriented programming. To do so, pertinent knowledge must be encoded which requires a notation and language to define relevant abstractions. This dissertation focuses on distributed-memory parallel tensor computations arising from computational chemistry. Specifically, we focus on applications based on the tensor contraction operation of dense, non-symmetric tensors. Creating an efficient algorithm for a given problem specification in this domain is complex; creating an optimized implementation of a developed algorithm is even more complex, tedious, and error-prone. To this end, we encode pertinent knowledge for distributed-memory parallel algorithms for tensor contractions of dense non-symmetric tensors. We do this by developing a notation for data distribution and redistribution that exposes a systematic procedure for deriving a family of algorithms for this operation for which efficient implementations exist. We validate the developed ideas by implementing them in the Redistribution Operations and Tensor Expressions application programming interface (ROTE API) and encoding them into an automated system, DxTer, for systematically generating efficient implementations from problem specifications. Experiments performed on the IBM Blue Gene/Q and Cray XC30 architectures testing generated implementations for the spin-adapted coupled cluster singles and doubles method from computational chemistry demonstrate impact both in terms of performance and storage requirements.Item Elixir: synthesis of parallel irregular algorithms(2015-05) Prountzos, Dimitrios; Pingali, Keshav; Misra, Jayadev; Batory, Don; Cook, William; Sagiv, Mooly; Gulwani, SumitAlgorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms is very challenging for a number of reasons. First, there may be many algorithms to solve a problem and each algorithm may have many implementations. Second, synchronization, which is necessary for correct parallel execution, introduces potential problems such as data-races and deadlocks. These issues interact in subtle ways, making the best solution dependent both on the parallel platform and on properties of the input graph. Consequently, implementing and selecting the best parallel solution can be a daunting task for non-experts, since we have few performance models for predicting the performance of parallel sparse graph programs on parallel hardware. This dissertation presents a synthesis methodology and a system, Elixir, that addresses these problems by (i) allowing programmers to specify solutions at a high level of abstraction, and (ii) generating many parallel implementations automatically and using search to find the best one. An Elixir specification consists of a set of operators capturing the main algorithm logic and a schedule specifying how to efficiently apply the operators. Elixir employs sophisticated automated reasoning to merge these two components, and uses techniques based on automated planning to insert synchronization and synthesize efficient parallel code. Experimental evaluation of our approach demonstrates that the performance of the Elixir generated code is competitive to, and can even outperform, hand-optimized code written by expert programmers for many interesting graph benchmarks.Item Galois : a system for parallel execution of irregular algorithms(2015-05) Nguyen, Donald Do; Pingali, Keshav; Alvisi, Lorenzo; Lethin, Richard; Padua, David; Witchel, EmmettA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Item Prioritization via stochastic optimization(2010-05) Koc, Ali; Popova, Elmira; Morton, David P.; Bard, Jonathan; Caramanis, Constantine; Hess, Stephen; Kutanoglu, ErhanWe take a novel perspective on real-life decision making problems involving binary activity-selection decisions that compete for scarce resources. The current literature in operations research approaches these problems by forming an optimal portfolio of activities that meets the specified resource constraints. However, often practitioners in industry and government do not take the optimal-portfolio approach. Instead, they form a rank-ordered list of activities and select those that have the highest priority. The academic literature tends to discredit such ranking schemes because they ignore dependencies among the activities. Practitioners, on the other hand, sometimes discredit the optimal-portfolio approach because if the problem parameters change, the set of activities that was once optimal no longer remains optimal. Even worse, the new optimal set of activities may exclude some of the previously optimal activities, which they may have already selected. Our approach takes both viewpoints into account. We rank activities considering both the uncertainty in the problem parameters and the optimal portfolio that will be obtained once the uncertainty is revealed. We use stochastic integer programming as a modeling framework. We develop several mathematical formulations and discuss their relative merits, comparing them theoretically and computationally. We also develop cutting planes for these formulations to improve computation times. To be able to handle larger real-life problem instances, we develop parallel branch-and-price algorithms for a capital budgeting application. Specifically, we construct a column-based reformulation, develop two branching strategies and a tabu search-based primal heuristic, propose two parallelization schemes, and compare these schemes on parallel computing environments using commercial and open-source software. We give applications of prioritization in facility location and capital budgeting problems. In the latter application, we rank maintenance and capital-improvement projects at the South Texas Project Nuclear Operating Company, a two-unit nuclear power plant in Wadsworth, Texas. We compare our approach with several ad hoc ranking schemes similar to those used in practice.