Distributed trigger counting algorithms

Date

2010-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

A distributed system consists of a set of N processor nodes and a finite set of communication channels. It is frequently described as a directed graph in which each vertex represents a processor node and the edges represent the communication channels. A global snapshot of a distributed system consists of the local states of all the processor nodes and all of the in-transit messages of a distributed computation. This is meaningful as it corresponds to the global state where all the local states and communication channels of all the processor nodes in the system are recorded simultaneously. A classic example where snapshots are utilized is in the scenario of some failure where the system can restart from the last global snapshot. This is an important application of global snapshot algorithms as it forms the basis for fault-tolerance in distributed programs and aids in serviceability as a distributed program debugging mechanism. Another important application includes checkpointing and monitoring systems where a set of continuous global snapshots are employed to detect when a certain number of triggers have been received by the system.

When the distributed system is scaled in terms of an increase in the number of processor nodes and an increase in the number of expected triggers the message complexity increases and impacts the total overhead for the communication and computation of the global snapshot algorithm. In such a large distributed system, an optimal algorithm is vital so that the distributed application program that is employing the snapshots does not suffer from performance degradation as the size of the distributed system continues to grow over time. We are interested in global snapshot algorithms that offer lower bound message complexity and lower bound MaxLoad messages for large values of N processor nodes and large values of W expected triggers. In this report we study and simulate the Centralized, Grid based, Tree Based, and LayeredRand global snapshot algorithms then evaluate the algorithms for total number of messages (sent and received) and MaxLoad messages (sent and received) for the trigger counting problem in distributed computing. The report concludes with simulation results that compare the performance of the algorithms with respect to the total number of messages and MaxLoad messages required by each algorithm to detect when the number of W triggers have been delivered to the distributed system.

Description

text

Citation