Browsing by Subject "Parallel computing"

Now showing 1 - 11 of 11

A Parallel Graph Partitioner for STAPL
(2013-04-26) Castet, Nicolas
Multi-core architectures are present throughout a large selection of computing devices from cell phones to super-computers. Parallel applications running on these devices solve bigger problems in a shorter time. Writing those applications is a difficult task for programmers. They need to deal with low-level parallel mechanisms such as data distribution, inter-processor communication, and task placement. The goal of the Standard Template Adaptive Parallel Library (STAPL) is to provide a generic high-level framework to develop parallel applications. One of the first steps of a parallel application is to partition and distribute the data throughout the system. An important data structure for parallel applications to store large amounts of data and model many types of relations is the graph. A mesh, which is a special type of graph, is often used to model a spatial domain in scientific applications. Graph and mesh partitioning has many applications such as VLSI circuit design, parallel task scheduling, and data distribution. Data distribution, significantly impacts the performance of a parallel application. In this thesis, we introduce the STAPL Parallel Graph Partitioner Framework. This framework provides a generic infrastructure to partition arbitrary graphs and meshes and to build customized partitioners. It includes the state of the art parallel k-way multilevel scheme to partition arbitrary graphs, a parallel mesh partitioner with parameterized partition shape, and a customized partitioner used for discrete ordinates particle transport computations. This framework is also part of a generic library, STAPL, allowing the partitioning of the data and development of the whole parallel application to be done in the same environment. We show the user-friendly interface of the framework and its scalability for partitioning different mesh and graph benchmarks on a Cray XE6 system. We also highlight the performance of our customized unstructured mesh partitioner for a discrete ordinates particle transport code. The developed columnar decompositions significantly reduce the execution time of simultaneous sweeps on unstructured meshes.
Design and implementation of distributed Galois
(2013-05) Dhanapal, Manoj; Pingali, Keshav
The Galois system provides a solution to the hard problem of parallelizing irregular algorithms using amorphous data-parallelism. The present system works on the shared-memory programming model. The programming model has limitations on the memory and processing power available to the application. A scalable distributed parallelization tool would give the application access to a very large amount of memory and processing power by interconnecting computers through a network. This thesis presents the design for a distributed execution programming model for the Galois system. This distributed Galois system is capable of executing irregular graph based algorithms on a distributed environment. The API and programming model of the new distributed system has been designed to mirror that of the existing shared-memory Galois. This was done to enable existing applications on shared memory applications to run on distributed Galois with minimal porting effort. Finally, two existing test cases have been implemented on distributed Galois and shown to scale with increasing number of hosts and threads.
Development and application of a parallel compositional reservoir simulator
(2012-08) Ghasemi Doroh, Mojtaba; Sepehrnoori, Kamy, 1951-; Delshad, Mojdeh
Simulation of large-scale and complex reservoirs requires fine and detailed gridding, which involves a significant amount of memory and is computationally expensive. Nowadays, clusters of PCs and high-performance computing (HPC) centers are widely available. These systems allow parallel processing, which helps large-scale simulations run faster and more efficient. In this research project, we developed a parallel version of The University of Texas Compositional Simulator (UTCOMP). The parallel UTCOMP is capable of running on both shared and distributed memory parallel computers. This parallelization included all physical features of the original code, such as higher-order finite difference, physical dispersion, and asphaltene precipitation. The parallelization was verified for several case studies using multiple processors. The parallel simulator produces outputs required for visualizing simulation results using the S3graph visualization software. The efficiency of the parallel simulator was assessed in terms of speedup using various numbers of processors. Subsequently, we improved the coding and implementation in the simulator in order to minimize the communications between the processors to improve the parallel efficiency to carry out the simulations. To improve the efficiency of the linear solver in the simulator, we implemented three well-known high-performance parallel solver packages (SAMG, Hypre, and PETSc) in the parallel simulator. Then, the performances of the solver packages were improved in terms of the input parameters for solving large-scale reservoir simulation problems. The developed parallel simulator has expanded the capability of the original code for simulating large-scale reservoir simulation case studies. In other words, with sufficient number of processors, a field-scale simulation with a million grid cells can be performed in few hours. Several case studies are presented to show the performance of the parallel simulator.
Domain decomposition methods in geomechanics
(2012-08) Florez Guzman, Horacio Antonio; Wheeler, Mary F. (Mary Fanett); Delshad, Mojdeh; Mear, Mark; Landis, Chad; Rodriguez, Adolfo
Hydrocarbon production or injection of fluids in the reservoir can produce changes in the rock stresses and in-situ geomechanics, potentially leading to compaction and subsidence with harmful effects in wells, cap-rock, faults, and the surrounding environment as well. In order to tackle these changes and their impact, accurate simulations are essential. The Mortar Finite Element Method (MFEM) has been demonstrated to be a powerful technique in order to formulate a weak continuity condition at the interface of sub-domains in which different meshes, i.e. non-conforming or hybrid, and / or variational approximations are used. This is particularly suitable when coupling different physics on different domains, such as elasticity and poroelasticity, in the context of coupled flow and geomechanics. In this dissertation, popular Domain Decomposition Methods (DDM) are implemented in order to carry large simulations by taking full advantage of current parallel computer architectures. Different solution schemes can be defined depending upon the way information is exchanged between sub-domain interfaces. Three different schemes, i.e. Dirichlet-Neumann (DN), Neumann-Neumann (NN) and MFEM, are tested and the advantages and disadvantages of each of them are identified. As a first contribution, the MFEM is extended to deal with curve interfaces represented by Non-Uniform Rational B-Splines (NURBS) curves and surfaces. The goal is to have a more robust geometrical representation for mortar spaces, which allows gluing non-conforming interfaces on realistic geometries. The resulting mortar saddle-point problem will be decoupled by means of the DN- and NN-DDM. Additionally, a reservoir geometry reconstruction procedure based on NURBS surfaces is presented as well. The technique builds a robust piecewise continuous geometrical representation that can be exploited by MFEM in order to tackle realistic problems, which is a second contribution. Tensor product meshes are usually propagated from the reservoir in a conforming way into its surroundings, which makes non-matching interfaces highly attractive in this case. In the context of reservoir compaction and subsidence estimation, it is common to deal with serial legacy codes for flow. Indeed, major reservoir simulators such as compositional codes lack parallelism. Another issue is the fact that, generally speaking, flow and mechanics domains are different. To overcome this limitation, a serial-parallel approach is proposed in order to couple serial flow codes with our parallel mechanics code by means of iterative coupling. Concrete results in loosely coupling are presented as a third contribution. As a final contribution, the DN-DDM is applied to couple elasticity and plasticity, which seems very promising in order to speed up computations involving poroplasticity. Several examples of coupling of elasticity, poroelasticity, and plasticity ranging from near-wellbore applications to field level subsidence computations help to show that the proposed methodology can handle problems of practical interest. In order to facilitate the implementation of complex workflows, an advanced Python wrapper interface that allows programming capabilities have been implemented. The proposed serial-parallel approach seems to be appropriate to handle geomechanical problems involving different meshes for flow and mechanics as well as coupling parallel mechanistic codes with legacy flow simulators.
Improving the efficiency of dynamic traffic assignment through computational methods based on combinatorial algorithm
(2011-08) Nezamuddin; Waller, S. Travis
Transportation planning and operation requires determining the state of the transportation system under different network supply and demand conditions. The most fundamental determinant of the state of a transportation system is time-varying traffic flow pattern on its roadway segments. It forms a basis for numerous engineering analyses which are used in operational- and planning-level decision-making process. Dynamic traffic assignment (DTA) models are the leading modeling tools employed to determine time-varying traffic flow pattern under changing network conditions. DTA models have matured over the past three decades, and are now being adopted by transportation planning agencies and traffic management centers. However, DTA models for large-scale regional networks require excessive computational resources. The problem becomes further compounded for other applications such as congestion pricing, capacity calibration, and network design for which DTA needs to be solved repeatedly as a sub-problem. This dissertation aims to improve the efficiency of the DTA models, and increase their viability for various planning and operational applications. To this end, a suite of computational methods based on the combinatorial approach for dynamic traffic assignment was developed in this dissertation. At first, a new polynomial run time combinatorial algorithm for DTA was developed. The combinatorial DTA (CDTA) model complements and aids simulation-based DTA models rather than replace them. This is because various policy measures and active traffic control strategies are best modeled using the simulation-based DTA models. Solution obtained from the CDTA model was provided as an initial feasible solution to a simulation-based DTA model to improve its efficiency – this process is called “warm starting” the simulation-based DTA model. To further improve the efficiency of the simulation-based DTA model, the warm start process is made more efficient through parallel computing. Parallel computing was applied to the CDTA model and the traffic simulator used for warm starting. Finally, another warm start method based on the static traffic assignment model was tested on the simulation-based DTA model. The computational methods developed in this dissertation were tested on the Anaheim, CA and Winnipeg, Canada networks. Models warm-started using the CDTA solution performed better than the purely simulation-based DTA models in terms of equilibrium convergence metrics and run time. Warm start methods using solutions from the static traffic assignment models showed similar improvements. Parallel computing was applied to the CDTA model, and it resulted in faster execution time by employing multiple computer processors. Parallel version of the traffic simulator can also be embedded into the simulation-assignment framework of the simulation-based DTA models and improve their efficiency.
Parallel Markov Chain Monte Carlo Methods for Large Scale Statistical Inverse Problems
(2014-04-18) Wang, Kainan
The Bayesian method has proven to be a powerful way of modeling inverse problems. The solution to Bayesian inverse problems is the posterior distribution of estimated parameters which can provide not only estimates for the inferred parameters but also the uncertainty of these estimations. Markov chain Monte Carlo (MCMC) is a useful technique to sample the posterior distribution and information can be extracted from the sampled ensemble. However, MCMC is very expensive to compute, especially in inverse problems where the underlying forward problems involve solving differential equations. Even worse, MCMC is difficult to parallelize due to its sequential nature|that is, under the current framework, we can barely accelerate MCMC with parallel computing. We develop a new framework of parallel MCMC algorithms-the Markov chain preconditioned Monte Carlo (MCPMC) method-for sampling Bayesian inverse problems. With the help of a fast auxiliary MCMC chain running on computationally cheaper approximate models, which serves as a stochastic preconditioner to the target distribution, the sampler randomly selects candidates from the preconditioning chain for further processing on the accurate model. As this accurate model processing can be executed in parallel, the algorithm is suitable for parallel systems. We implement it using a modified master-slave architecture, analyze its potential to accelerate sampling and apply it to three examples. A two dimensional Gaussian mixture example shows that the new sampler can bring statistical efficiency in addition to increasing sampling speed. Through a 2D inverse problem with an elliptic equation as the forward model, we demonstrate the use of an enhanced error model to build the preconditioner. With a 3D optical tomography problem we use adaptive finite element methods to build the approximate model. In both examples, the MCPMC successfully samples the posterior distributions with multiple processors, demonstrating efficient speedups comparing to traditional MCMC algorithms. In addition, the 3D optical tomography example shows the feasibility of applying MCPMC towards real world, large scale, statistical inverse problems.
Parallel VLSI Circuit Analysis and Optimization
(2012-02-14) Ye, Xiaoji
The prevalence of multi-core processors in recent years has introduced new opportunities and challenges to Electronic Design Automation (EDA) research and development. In this dissertation, a few parallel Very Large Scale Integration (VLSI) circuit analysis and optimization methods which utilize the multi-core computing platform to tackle some of the most difficult contemporary Computer-Aided Design (CAD) problems are presented. The first CAD application that is addressed in this dissertation is analyzing and optimizing mesh-based clock distribution network. Mesh-based clock distribution network (also known as clock mesh) is used in high-performance microprocessor designs as a reliable way of distributing clock signals to the entire chip. The second CAD application addressed in this dissertation is the Simulation Program with Integrated Circuit Emphasis (SPICE) like circuit simulation. SPICE simulation is often regarded as the bottleneck of the design flow. Recently, parallel circuit simulation has attracted a lot of attention. The first part of the dissertation discusses circuit analysis techniques. First, a combination of clock network specific model order reduction algorithm and a port sliding scheme is presented to tackle the challenges in analyzing large clock meshes with a large number of clock drivers. Our techniques run much faster than the standard SPICE simulation and existing model order reduction techniques. They also provide a basis for the clock mesh optimization. Then, a hierarchical multi-algorithm parallel circuit simulation (HMAPS) framework is presented as an novel technique of parallel circuit simulation. The inter-algorithm parallelism approach in HMAPS is completely different from the existing intra-algorithm parallel circuit simulation techniques and achieves superlinear speedup in practice. The second part of the dissertation talks about parallel circuit optimization. A modified asynchronous parallel pattern search (APPS) based method which utilizes the efficient clock mesh simulation techniques for the clock driver size optimization problem is presented. Our modified APPS method runs much faster than a continuous optimization method and effectively reduces the clock skew for all test circuits. The third part of the dissertation describes parallel performance modeling and optimization of the HMAPS framework. The performance models and runtime optimization scheme improve the speed of HMAPS further more. The dynamically adapted HMAPS becomes a complete solution for parallel circuit simulation.
Seismic modeling and imaging with Fourier method : numerical analyses and parallel implementation strategies
(2009-12) Chu, Chunlei, 1977-; Stoffa, Paul L., 1948-
Our knowledge of elastic wave propagation in general heterogeneous media with complex geological structures comes principally from numerical simulations. In this dissertation, I demonstrate through rigorous theoretical analyses and comprehensive numerical experiments that the Fourier method is a suitable method of choice for large scale 3D seismic modeling and imaging problems, due to its high accuracy and computational efficiency. The most attractive feature of the Fourier method is its ability to produce highly accurate solutions on relatively coarser grids, compared with other numerical methods for solving wave equations. To further advance the Fourier method, I identify two aspects of the method to focus on in this work, i.e., its implementation on modern clusters of computers and efficient high-order time stepping schemes. I propose two new parallel algorithms to improve the efficiency of the Fourier method on distributed memory systems using MPI. The first algorithm employs non-blocking all-to-all communications to optimize the conventional parallel Fourier modeling workflows by overlapping communication with computation. With a carefully designed communication-computation overlapping mechanism, a large amount of communication overhead can be concealed when implementing different kinds of wave equations. The second algorithm combines the advantages of both the Fourier method and the finite difference method by using convolutional high-order finite difference operators to evaluate the spatial derivatives in the decomposed direction. The high-order convolutional finite difference method guarantees a satisfactory accuracy and provides the flexibility of using non-blocking point-to-point communications for efficient interprocessor data exchange and the possibility of overlapping communication and computation. As a result, this hybrid method achieves an optimized balance between numerical accuracy and computational efficiency. To improve the overall accuracy of time domain Fourier simulations, I propose a family of new high-order time stepping schemes, based on a novel algorithm for designing time integration operators, to reduce temporal derivative discretization errors in a cost-effective fashion. I explore the pseudo-analytical method and propose high-order formulations to further improve its accuracy and ability to deal with spatial heterogeneities. I also extend the pseudo-analytical method to solve the variable-density acoustic and elastic wave equations. I thoroughly examine the finite difference method by conducting complete numerical dispersion and stability analyses. I comprehensively compare the finite difference method with the Fourier method and provide a series of detailed benchmarking tests of these two methods under a number of different simulation configurations. The Fourier method outperforms the finite difference method, in terms of both accuracy and efficiency, for both the theoretical studies and the numerical experiments, which provides solid evidence that the Fourier method is a superior scheme for large scale seismic modeling and imaging problems.
Thermodynamically consistent modeling and simulation of multiphase flows
(2014-12) Liu, Ju; Hughes, Thomas J. R.
Multiphase flow is a familiar phenomenon from daily life and occupies an important role in physics, engineering, and medicine. The understanding of multiphase flows relies largely on the theory of interfaces, which is not well understood in many cases. To date, the Navier-Stokes-Korteweg equations and the Cahn-Hilliard equation have represented two major branches of phase-field modeling. The Navier-Stokes-Korteweg equations describe a single component fluid material with multiple states of matter, e.g., water and water vapor; the Cahn-Hilliard type models describe multi-component materials with immiscible interfaces, e.g., air and water. In this dissertation, a unified multiphase fluid modeling framework is developed based on rigorous mathematical and thermodynamic principles. This framework does not assume any ad hoc modeling procedures and is capable of formulating meaningful new models with an arbitrary number of different types of interfaces. In addition to the modeling, novel numerical technologies are developed in this dissertation focusing on the Navier-Stokes-Korteweg equations. First, the notion of entropy variables is properly generalized to the functional setting, which results in an entropy-dissipative semi-discrete formulation. Second, a family of quadrature rules is developed and applied to generate fully discrete schemes. The resulting schemes are featured with two main properties: they are provably dissipative in entropy and second-order accurate in time. In the presence of complex geometries and high-order differential terms, isogeometric analysis is invoked to provide accurate representations of computational geometries and robust numerical tools. A novel periodic transformation operator technology is also developed within the isogeometric context. It significantly simplifies the procedure of the strong imposition of periodic boundary conditions. These attributes make the proposed technologies an ideal candidate for credible numerical simulation of multiphase flows. A general-purpose parallel computing software, named PERIGEE, is developed in this work to provide an implementation framework for the above numerical methods. A comprehensive set of numerical examples has been studied to corroborate the aforementioned theories. Additionally, a variety of application examples have been investigated, culminating with the boiling simulation. Importantly, the boiling model overcomes several challenges for traditional boiling models, owing to its thermodynamically consistent nature. The numerical results indicate the promising potential of the proposed methodology for a wide range of multiphase flow problems.
Towards river flow computation at the continental scale
(2009-08) David, Cédric H., 1981-; Maidment, David R.; Yang, Zong-liang
The work presented in this dissertation informs on river network modeling at large scales using geographic information systems, parallel computing and the latest advancements of atmospheric and land surface modeling. This work is motivated by the availability of a vector-based Geographic Information System dataset that describes the networks of streams and rivers in the United States, and how they are connected. A land surface model called Noah-distributed is used to provide lateral inflow to an NHDPlus river network in the Guadalupe River Basin in Texas. Challenges related to the projection of gridded hydrographic data from a coordinate system to another are investigated. The different representations of the shape of the Earth used in atmospheric science (spherical) and hydrology (spheroidal) can lead to a significant North-South shift on the order of 20 km at mid latitudes. A river network model called RAPID is developed and applied in a four-year study of the Guadalupe and San Antonio River Basins in Texas using the river network of NHDPlus. Gage measurements are used to estimate flow wave celerities in a river network and to assess the quality of RAPID flow computations. The performance of RAPID in a massively-parallel computing environment is tested and further investigation of its scalability is needed before using RAPID at the state or federal level. The replacement by RAPID of the river routing scheme used in SIM-France -- a hydro-meteorological model -- is investigated in a ten-year study of river flow in France. While the formulation of RAPID improves the functionality of SIM-France, the flow simulations are comparable in accuracy to those previously obtained by SIM-France. Sub-basin parameterization was found to improve model results. A single criterion for quantifying the quality of river flow simulations using several river gages globally in a river network is developed that normalizes the square error of modeled flow to allow equal treatment of all gaging stations regardless of the magnitude of flow. The use of this criterion as the cost function for parameter estimation in RAPID allows better results than by increasing the degree of spatial variability in model parameters.
Uncertainty propagation and conjunction assessment for resident space objects
(2015-12) Vittaldev, Vivek; Russell, Ryan Paul, 1976-; Erwin, Richard S; Akella, Maruthi R; Bettadpur, Srinivas V; Humphreys, Todd E
Presently, the catalog of Resident Space Objects (RSOs) in Earth orbit tracked by the U.S. Space Surveillance Network (SSN) is greater than 21,000 objects. The size of the catalog continues to grow due to an increasing number of launches, improved tracking capabilities, and in some cases, collisions. Simply propagating the states of these RSOs is a computational burden, while additionally propagating the uncertainty distributions of the RSOs and computing collision probabilities increases the computational burden by at least an order of magnitude. Tools are developed that propagate the uncertainty of RSOs with Gaussian initial uncertainty from epoch until a close approach. The number of possible elements in the form of a precomputed library, in a Gaussian Mixture Model (GMM) has been increased and the strategy for multivariate problems has been formalized. The accuracy of a GMM is increased by propagating each element by a Polynomial Chaos Expansion (PCE). Both techniques reduce the number of function evaluations required for uncertainty propagation and result in a sliding scale where accuracy can be improved at the cost of increased computation time. A parallel implementation of the accurate benchmark Monte Carlo (MC) technique has been developed on the Graphics Processing Unit (GPU) that is capable of using samples from any uncertainty propagation technique to compute the collision probability. The GPU MC tool delivers up to two orders of magnitude speedups compared to a serial CPU implementation. Finally, a CPU implementation of the collision probability computations using Cartesian coordinates requires orders of magnitude fewer function evaluations compared to a MC run. Fast computation of the inherent nonlinear growth of the uncertainty distribution in orbital mechanics and accurately computing the collision probability is essential for maintaining a future space catalog and for preventing an uncontrolled growth in the debris population. The uncertainty propagation and collision probability computation methods and algorithms developed here are capable of running on personal workstations and stand to benefit users ranging from national space surveillance agencies to private satellite operators. The developed techniques are also applicable for many general uncertainty quantification and nonlinear estimation problems.

Browsing by Subject "Parallel computing"

Results Per Page

Sort Options