A Parallel Graph Partitioner for STAPL

Date

2013-04-26

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Multi-core architectures are present throughout a large selection of computing devices from cell phones to super-computers. Parallel applications running on these devices solve bigger problems in a shorter time. Writing those applications is a difficult task for programmers. They need to deal with low-level parallel mechanisms such as data distribution, inter-processor communication, and task placement. The goal of the Standard Template Adaptive Parallel Library (STAPL) is to provide a generic high-level framework to develop parallel applications.

One of the first steps of a parallel application is to partition and distribute the data throughout the system. An important data structure for parallel applications to store large amounts of data and model many types of relations is the graph. A mesh, which is a special type of graph, is often used to model a spatial domain in scientific applications. Graph and mesh partitioning has many applications such as VLSI circuit design, parallel task scheduling, and data distribution. Data distribution, significantly impacts the performance of a parallel application.

In this thesis, we introduce the STAPL Parallel Graph Partitioner Framework. This framework provides a generic infrastructure to partition arbitrary graphs and meshes and to build customized partitioners. It includes the state of the art parallel k-way multilevel scheme to partition arbitrary graphs, a parallel mesh partitioner with parameterized partition shape, and a customized partitioner used for discrete ordinates particle transport computations. This framework is also part of a generic library, STAPL, allowing the partitioning of the data and development of the whole parallel application to be done in the same environment.

We show the user-friendly interface of the framework and its scalability for partitioning different mesh and graph benchmarks on a Cray XE6 system. We also highlight the performance of our customized unstructured mesh partitioner for a discrete ordinates particle transport code. The developed columnar decompositions significantly reduce the execution time of simultaneous sweeps on unstructured meshes.

Description

Citation