Browsing by Subject "High-performance computing"

Now showing 1 - 4 of 4

Computational process networks : a model and framework for high-throughput signal processing
(2011-05) Allen, Gregory Eugene; Evans, Brian L. (Brian Lawrence), 1965-; Browne, James C.; Chase, Craig M.; John, Lizy K.; Loeffler, Charles M.
Many signal and image processing systems for high-throughput, high-performance applications require concurrent implementations in order to realize desired performance. Developing software for concurrent systems is widely acknowledged to be difficult, with common industry practice leaving the burden of preventing concurrency problems on the programmer. The Kahn Process Network model provides the mathematically provable property of determinism of a program result regardless of the execution order of its processes, including concurrent execution. This model is also natural for describing streams of data samples in a signal processing system, where processes transform streams from one data type to another. However, a Kahn Process Network may require infinite memory to execute. I present the dynamic distributed deadlock detection and resolution (D4R) algorithm, which permits execution of Process Networks in bounded memory if it is possible. It detects local deadlocks in a Process Network, determines whether the deadlock can be resolved and, if so, identifies the process that must take action to resolve the deadlock. I propose the Computational Process Network (CPN) model which is based on the formalisms of Kahn’s PN model, but with enhancements that are designed to make it efficiently implementable. These enhancements include multi-token transactions to reduce execution overhead, multi-channel queues for multi-dimensional synchronous data, zero-copy semantics, and consumer and producer firing thresholds for queues. Firing thresholds enable memoryless computation of sliding window algorithms, which are common in signal processing systems. I show that the Computational Process Network model preserves the formal properties of Process Networks, while reducing the operations required to implement sliding window algorithms on continuous streams of data. I also present a high-throughput software framework that implements the Computational Process Network model using C++, and which maps naturally onto distributed targets. This framework uses POSIX threads, and can exploit parallelism in both multi-core and distributed systems. Finally, I present case studies to exercise this framework and demonstrate its performance and utility. The final case study is a three-dimensional circular convolution sonar beamformer and replica correlator, which demonstrates the high throughput and scalability of a real-time signal processing algorithm using the CPN model and framework.
Design by transformation : from domain knowledge to optimized program generation
(2014-05) Marker, Bryan Andrew; Van de Geijn, Robert A.; Batory, Don S., 1953-
Expert design knowledge is essential to develop a library of high-performance software. This includes how to implement and parallelize domain operations, how to optimize implementations, and estimates of which implementation choices are best. An expert repeatedly applies his knowledge, often in a rote and tedious way, to develop all of the related functionality expected from a domain-specific library. Expert knowledge is hard to gain and is easily lost over time when an expert forgets or when a new engineer starts developing code. The domain of dense linear algebra (DLA) is a prime example with software that is so well designed that much of experts' important work has become tediously rote in many ways. In this dissertation, we demonstrate how one can encode design knowledge for DLA so it can be automatically applied to generate code as an expert would or to generate better code. Further, the knowledge is encoded for perpetuity, so it can be reused to make implementing functionality on new hardware easier or it can be used to teach how software is designed to a non-expert. We call this approach to software engineering (encoding expert knowledge and automatically applying it) Design by Transformation (DxT). We present our vision, the methodology, a prototype code generation system, and possibilities when applying DxT to the domain of dense linear algebra.
Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation
(2015-12) Schatz, Martin Daniel; Van de Geijn, Robert A.; Kolda, Tamara G.; Stanton, John F; Pingali, Keshav; Hammond, Jeff R; Batory, Don S
A goal of computer science is to develop practical methods to automate tasks that are otherwise too complex or tedious to perform manually. Complex tasks can include determining a practical algorithm and creating the associated implementation for a given problem specification. Goal-oriented programming can make this systematic. Therefore, we can rely on automated tools to create implementations by expressing the task of creating implementations in terms of goal-oriented programming. To do so, pertinent knowledge must be encoded which requires a notation and language to define relevant abstractions. This dissertation focuses on distributed-memory parallel tensor computations arising from computational chemistry. Specifically, we focus on applications based on the tensor contraction operation of dense, non-symmetric tensors. Creating an efficient algorithm for a given problem specification in this domain is complex; creating an optimized implementation of a developed algorithm is even more complex, tedious, and error-prone. To this end, we encode pertinent knowledge for distributed-memory parallel algorithms for tensor contractions of dense non-symmetric tensors. We do this by developing a notation for data distribution and redistribution that exposes a systematic procedure for deriving a family of algorithms for this operation for which efficient implementations exist. We validate the developed ideas by implementing them in the Redistribution Operations and Tensor Expressions application programming interface (ROTE API) and encoding them into an automated system, DxTer, for systematically generating efficient implementations from problem specifications. Experiments performed on the IBM Blue Gene/Q and Cray XC30 architectures testing generated implementations for the spin-adapted coupled cluster singles and doubles method from computational chemistry demonstrate impact both in terms of performance and storage requirements.
Extending the reach of algorithms for the calculation of molecular vibronic spectra
(2016-12) Rabidoux, Scott Michael; Stanton, John (John F.); Eijkhout, Victor; Arbogast, Todd; van de Geijn, Robert; Makarov, Dmitrii
Theoretical spectroscopy is an important field of chemistry that can help extract useful information about the properties of a molecule from experimental spectral data. Ab initio calculations of molecular spectra can be performed and compared against experimental data to determine the validity of various calculated molecular properties. Unfortunately, the computational cost of these spectral simulations rises quickly with the number of atoms in the molecule of interest. As a result, current techniques for simulating molecular spectra are often limited for use with only the smallest of molecules. The main purpose of this work is to develop new computational tools in an effort to extend the reach of current state-of-the-art spectral simulation algorithms and allow for the spectroscopic study of larger molecules than is currently feasible. The calculation of vibronic spectra requires the solution of the time-independent Schrödinger equation to obtain the vibronic energy levels of a molecule and their corresponding transition intensities. When the Born-Oppenheimer approximation is applicable for the solution of the time-independent Schrödinger equation, the vibrational energy levels of a molecule can be easily determined analytically, if the harmonic approximation is used. What remains, then, for a spectral simulation, is the calculation of the transition intensities associated with each energy level. Under the harmonic approximation, the transition intensities (also known as Franck-Condon factors) can be calculated via a set of recurrence equations developed by Doktorov, Malkin, and Manko. The implementation of these recurrence equations, though, can be computationally intensive for medium-to-large molecules, especially for finite-temperature simulations. In this work, I present a new algorithm for the calculation of Franck-Condon factors via the Doktorov recurrence equations that achieves significantly better computational performance than existing implementations, with speedups of roughly thirty times on a single processor. When the Born-Oppenheimer approximation is not applicable, vibronic coupling effects must be accounted for to achieve an accurate spectral simulation. A common approach for treating vibronic coupling effects is to solve the time-independent Schrödinger equation using a model Hamiltonian developed by Köppel, Domcke, and Cederbaum (KDC). Using the KDC approach, the problem of solving the Schrödinger equation becomes a problem of solving for the eigenstates of a large, sparse matrix. The computational difficulty of this problem is then dependent on the size of the matrix. Unfortunately, the size of the matrix at hand grows exponentially with the number of vibrational modes of the molecule of interest, and matrix dimensions can easily reach upwards of one billion and beyond. In an attempt to make tractable problems involving very large matrices, I present in this work a distributed-memory parallelization strategy for the KDC approach. The resulting parallel algorithm achieves impressive parallel scalability and has been used to study several previously intractable spectroscopic problems. I conclude this work by presenting ab initio calculations and spectral simulations for the molecule trans-1,3-butadiene. The spectroscopy of butadiene has been studied by expermentalists and theorist alike, but a complete KDC spectral model has yet to be achieved due in part to the large size of the resulting matrices. Using the newly-developed parallel algorithm described above, I am able to present spectra from simulations using the most complete KDC model for butadiene to date and discuss what these results may tell us about butadiene’s electronic structure.

Browsing by Subject "High-performance computing"

Results Per Page

Sort Options