Browsing by Subject "Parallel"

Now showing 1 - 11 of 11

A continuing investigation into the stress field around two parallet-edge cracks in a finite body
(Texas A&M University, 2005-02-17) Gilman, Justin Patrick
The goal of this research was to extend the investigation into a method to represent and analyze the stress field around two parallel edge cracks in a finite body. The Westergaard-Schwarz method combined with the local collocation method was used to analyze different cases of two parallel edge cracks in a finite body. Using this method a determination of when two parallel edge cracks could be analyzed as isolated single edge cracks was determined Numerical experimentation was conducted using ABAQUS. It was used to obtain the coordinate and stress information required in the local collocation method. The numerical models were created by maintaining one crack at a fixed length while varying the length of the second crack as well as the separation distance of the two cracks. The results obtained through the local collocation method were compared with the finite element obtained J-Integrals to verify the accuracy of the results. The results obtained in the analysis showed that the major factor in determining when the second crack?s stress field has to be considered was the crack separation distance. It was found that a reduction in the second crack?s length did not have a significant effect on overall stress intensity factors of the fixed crack. A larger change in the opening mode stress intensity factor can be seen by varying the crack separation distance. As well as seeing a steady reduction in shear mode stress intensity factors as the crack separation was increased. The results showed that after a certain crack separation distance the two cracks could be analyzed separately without introducing significant error into the stress field calculations.
Biophysically Accurate Brain Modeling and Simulation using Hybrid MPI/OpenMP Parallel Processing
(2012-07-16) Hu, Jingzhen
In order to better understand the behavior of the human brain, it is very important to perform large scale neural network simulation which may reveal the relationship between the whole network activity and the biophysical dynamics of individual neurons. However, considering the complexity of the network and the large amount of variables, researchers choose to either simulate smaller neural networks or use simple spiking neuron models. Recently, supercomputing platforms have been employed to greatly speedup the simulation of large brain models. However, there are still limitations of these works such as the simplicity of the modeled network structures and lack of biophysical details in the neuron models. In this work, we propose a parallel simulator using biophysically realistic neural models for the simulation of large scale neural networks. In order to improve the performance of the simulator, we adopt several techniques such as merging linear synaptic receptors mathematically and using two level time steps, which significantly accelerate the simulation. In addition, we exploit the efficiency of parallel simulation through three parallel implementation strategies: MPI parallelization, MPI parallelization with dynamic load balancing schemes and Hybrid MPI/OpenMP parallelization. Through experimental studies, we illustrate the limitation of MPI implementation due to the imbalanced workload among processors. It is shown that the two developed MPI load balancing schemes are not able to improve the simulation efficiency on the targeted parallel platform. Using 32 processors, the proposed hybrid approach, on the other hand, is more efficient than the MPI implementation and is about 31X faster than a serial implementation of the simulator for a network consisting of more than 100,000 neurons. Finally, it is shown that for large neural networks, the presented approach is able to simulate the transition from the 3Hz delta oscillation to epileptic behaviors due to the alterations of underlying cellular mechanisms.
Fast parallel solution of heterogeneous 3D time-harmonic wave equations
(2012-12) Poulson, Jack Lesly; Ying, Lexing; Engquist, Bjorn; Fomel, Sergey; Ghattas, Omar; van de Geijn, Robert
Several advancements related to the solution of 3D time-harmonic wave equations are presented, especially in the context of a parallel moving-PML sweeping preconditioner for problems without large-scale resonances. The main contribution of this dissertation is the introduction of an efficient parallel sweeping preconditioner and its subsequent application to several challenging velocity models. For instance, 3D seismic problems approaching a billion degrees of freedom have been solved in just a few minutes using several thousand processors. The setup and application costs of the sequential algorithm were also respectively refined to O(γ^2 N^(4/3)) and O(γ N log N), where N denotes the total number of degrees of freedom in the 3D volume and γ(ω) denotes the modestly frequency-dependent number of grid points per Perfectly Matched Layer discretization. Furthermore, high-performance parallel algorithms are proposed for performing multifrontal triangular solves with many right-hand sides, and a custom compression scheme is introduced which builds upon the translation invariance of free-space Green’s functions in order to justify the replacement of each dense matrix within a certain modified multifrontal method with the sum of a small number of Kronecker products. For the sake of reproducibility, every algorithm exercised within this dissertation is made available as part of the open source packages Clique and Parallel Sweeping Preconditioner (PSP).
Massively-Parallel Direct Numerical Simulation of Gas Turbine Endwall Film-Cooling Conjugate Heat Transfer
(2011-02-22) Meador, Charles Michael
Improvements to gas turbine efficiency depend closely on cooling technologies, as efficiency increases with turbine inlet temperature. To aid in this process, simulations that consider real engine conditions need to be considered. The first step towards this goal is a benchmark study using direct numerical simulations to consider a single periodic film cooling hole that characterizes the error in adiabatic boundary conditions, a common numerical simpliflication. Two cases are considered: an adiabatic case and a conjugate case. The adiabatic case is for validation to previous work conducted by Pietrzyk and Peet. The conjugate case considers heat transfer in the solid endwall in addition to the fluid, eliminating any simplified boundary conditions. It also includes an impinging jet and plenum, typical of actual endwall configurations. The numerical solver is NEK5000 and the two cases were run at 504 and 128 processors for the adiabatic and conjugate cases respectively. The approximate combined time is 100,000 CPU hours. In the adiabatic case, the results show good agreement for average velocity profiles but over prediction of the film cooling effectiveness. A convergence study suggests that there may be an area of unresolved flow, and the film cooling momentum flux may be too high. Preliminary conjugate results show agreement with velocity profiles, and significant differences in cooling effectiveness. Both cases will need to be refined near the cooling hole exit, and another convergence study done. The results from this study will be used in a larger case that considers an actual turbine vane and film cooling hole arrangement with real engine conditions.
Massively-Parallel Spectral Element Large Eddy Simulation of a Ring-Type Gas Turbine Combustor
(2012-07-16) Camp, Joshua Lane
The average and fluctuating components in a model ring-type gas turbine combustor are characterized using a Large Eddy Simulation at a Reynolds number of 11,000, based on the bulk velocity and the mean channel height. A spatial filter is applied to the incompressible Navier-Stokes equations, and a high pass filtered Smagorinsky model is used to model the sub-grid scales. Two cases are studied: one with only the swirler inlet active, and one with a single row of dilution jets activated, operating at a momentum flux ratio J of 100. The goal of both of these studies is to validate the capabilities of the solver NEK5000 to resolve important flow features inherent to gas turbine combustors by comparing qualitatively to the work of Jakirlic. Both cases show strong evidence of the Precessing Vortex Core, an essential flow feature in gas turbine combustors. Each case captures other important flow characteristics, such as corner eddies, and in general predicts bulk flow movements well. However, the simulations performed quite poorly in terms of predicting turbulence shear stress quantities. Difficulties in properly emulating the turbulent velocity entering the combustor for the swirl, as well as mesh quality concerns, may have skewed the results. Overall, though small length scale quantities were not accurately captured, the large scale quantities were, and this stress test on the HPF LES model will be built upon in future work that looks at more complex combustors.
Parallel Block Lanczos for solving large binary systems
(Texas Tech University, 2006-08) Peterson, Michael; Monico, Christopher J.; Allen, Edward J.; Smith, Philip W.
The Lanczos algorithm is very useful in solving a large, sparse linear system A x = y and then finding vectors in the kernel of A. Such a system may arise naturally through number factoring algorithms for example. In this thesis, we present a variant of the binary Lanczos algorithm that directly finds vectors in the kernel of A without first solving the system. The number factoring algorithms ultimately require us to find such vectors. Our adaptation combines ideas of Peter Montgomery and Don Coppersmith. We also discuss implementation issues, including parallelization.
Parallel Seismic Ray Tracing
(2013-12-09) Jain, Tarun K
Seismic ray tracing is a common method for understanding and modeling seismic wave propagation. The wavefront construction (WFC) method handles wavefronts instead of individual rays, thereby providing a mechanism to control ray density on the wavefront. In this thesis we present the design and implementation of a parallel wavefront construction algorithm (pWFC) for seismic ray tracing. The proposed parallel algo- rithm is developed using the stapl library for parallel C++ code.We present the idea of modeling ray tubes with an additional ray in the center to facilitate parallelism. The parallel wavefront construction algorithm is applied to wide range of models such as simple synthetic models that enable us to study various aspects of the method while others are intended to be representative of basic geological features such as salt domes. We also present a theoretical model to understand the performance of the pWFC algorithm. We evaluate the performance of the proposed parallel wavefront construction algorithm on an IBM Power 5 cluster. We study the effect of using different mesh types, varying the position of source and their number etc. The method is shown to provide good scalable performance for different models. Load balancing is also shown to be the major factor hindering the performance of the algorithm. We provide two load balancing algorithms to solve the load imbalance problem. These algorithms will be developed as an extension of the current work.
Parallelizing an interactive theorem prover : functional programming and proofs with ACL2
(2012-12) Rager, David Lawrence; Hunt, Warren A., 1958-; Browne, James C; Kaufmann, Matt; Moore, J S; Sawada, Jun; Witchel, Emmett
Multi-core systems have become commonplace, however, theorem provers often do not take advantage of the additional computing resources in an interactive setting. This research explores automatically using these additional resources to lessen the delay between when users submit conjectures to the theorem prover and when they receive feedback from the prover that is useful in discovering how to successfully complete the proof of a particular theorem. This research contributes mechanisms that permit applicative programs to execute in parallel while simultaneously preparing these programs for verification by a semi-automatic reasoning system. It also contributes a parallel version of an automated theorem prover, with management of user interaction issues, such as output and how inherently single-threaded, user-level proof features can be configured for use with parallel computation. Finally, this dissertation investigates the types of proofs that are amenable to parallel execution. This investigation yields the result that almost all proof attempts that require a non-trivial amount of time can benefit from parallel execution. Proof attempts executed in parallel almost always provide the aforementioned feedback sooner than if they executed serially, and their execution time is often significantly reduced.
Perfromance analysis of the Parallel Community Atmosphere Model (CAM) application
(2009-06-02) Shawky Sharkawi, Sameh Sherif
Efficient execution of parallel applications requires insight into how the parallel system features impact the performance of the application. Significant experimental analysis and the development of performance models enhance the understanding of such an impact. Deep understanding of an application?s major kernels and their design leads to a better understanding of the application?s performance, and hence, leads to development of better performance models. The Community Atmosphere Model (CAM) is the latest in a series of global atmospheric models developed at the National Center for Atmospheric Research (NCAR) as a community tool for NCAR and the university research community. This work focuses on analyzing CAM and understanding the impact of different architectures on this application. In the analysis of CAM, kernel coupling, which quantifies the interaction between adjacent and chains of kernels in an application, is used. All experiments are conducted on four parallel platforms: NERSC (National Energy Research Scientific Computing Center) Seaborg, SDSC (San Diego Supercomputer Center) DataStar P655, DataStar P690 and PSC (Pittsburgh Supercomputing Center) Lemieux. Experimental results indicate that kernel coupling gave an insight into many of the application characteristics. One important characteristic of CAM is that its performance is heavily dependent on a parallel platform memory hierarchy; different cache sizes and different cache policies had the major effect on CAM?s performance. Also, coupling values showed that although CAM?s kernels share many data structures, most of the coupling values are still destructive (i.e., interfering with each other so as to adversely affect performance). The kernel coupling results helps developers in pointing out the bottlenecks in memory usage in CAM. The results obtained from processor partitioning are significant in helping CAM users in choosing the right platform to run CAM.
Tradeoffs in parallel prefix adder structures
(2015-05) Stanley, Jonathan Barten; Swartzlander, Earl E., Jr., 1945-
This report presents the results of research on comparing the structures and qualities of fast parallel prefix adders. The binary adder serves as a fundamental component of many digital arithmetic operations. Many modern microprocessors and ASICs that require high speed arithmetic logic often implement parallel prefix adders. Modern parallel prefix adder structures are based on previous works including those of Kogge-Stone, Brent-Kung, Ladner-Fischer, Knowles, et al. and designs presented in each work have their own merits and tradeoffs that are suitable for certain applications. Previous works have described standard and systematic ways to design and construct functional parallel prefix adder structures. Although the parallel prefix adder has been studied for decades, this work explores the possibility that non-standard and more optimal structures may exist by developing and utilizing a brute force search algorithm based on the prefix operator rules and properties to find all possible parallel prefix adder structures. The parallel prefix adder search algorithm design, search results and study of tradeoffs are discussed in this work.
Upscaling and parallel reservoir simulation
(2011) Wang, Kefei; Sepehrnoori, Kamy, 1951-; Killough, John E.
Reservoir characterization techniques have made possible geological reservoir models with multi-million grid blocks populated with permeability, porosity, and fluid saturations. These geological models are often too large to be simulated because of computational limits. These computational limits mean that typical full-field reservoir simulation models are limited to fewer than 1 million cells - at least two orders of magnitude smaller than the geological models. Upscaling techniques have been used to bridge the gap between these geological models and full-field reservoir simulation. Although there have been significant efforts in developing single-phase and two-phase upscaling algorithms, a limited verification of upscaling methods has been performed on a full-field basis. In addition to upscaling techniques, parallel simulation approaches have been developed to solve multi-million cell models with reasonable computational efficiency. Parallel simulations take up to a few hours of CPU time instead of days to run multi- million cell models. However, when many simulations are to be performed over a large range of parameter values for uncertainty studies, parallel simulations again become prohibitive and upscaling must be employed. On the other hand, the results from these upscaled simulations must be validated with results from fine-scale simulations to give confidence on the reliability of the results. There is really no way of knowing how good the results are unless we are able to perform the fine-scale simulations for verification. Parallel ultra-fine-scale simulations may provide the tool for this verification requirement. In this work, we developed several new single-phase upscaling algorithms, and investigated the verification of these techniques applied to a reservoir model and a synthetic model. For complicated multi-phase flow, the single-phase upscaling may lead to large errors. To overcome the inaccuracy, a new relative permeability upscaling approach was investigated in this dissertation research. The new approach was verified by using three-phase, 3D, and highly heterogeneity reservoir model. Based on case studies, the results from the fine-scale model may appropriately be used to guide the upscaling. The parallel simulation may guide engineers to find appropriate upscaled models through a tuning procedure. This tuning procedure has been explored in the current study to obtain results that are in close agreement with the fine-scale simulation results. The combination of parallel simulation technology and upscaling algorithms can be used to provide a better estimation of the amount of uncertainty in predicted oil recovery for real fields.

Browsing by Subject "Parallel"

Results Per Page

Sort Options