GPU-based Parallel Computing Models and Implementations for Two-party Privacy-preserving Protocols

Pu, Shi

GPU-based Parallel Computing Models and Implementations for Two-party Privacy-preserving Protocols

Date

2013-11-25

Authors

Pu, Shi

Abstract

In (two-party) privacy-preserving-based applications, two users use encrypted inputs to compute a function without giving out plaintext of their input values. Privacy-preserving computing algorithms have to utilize a large amount of computing resources to handle the encryption-decryption operations. In this dissertation, we study optimal utilization of computing resources on the graphic processor unit (GPU) architecture for privacy-preserving protocols based on secure function evaluation (SFE) and the Elliptic Curve Cryptographic (ECC) and related algorithms. A number of privacy-preserving protocols are implemented, including private set intersection (PSI), secret handshaking (SH), secure Edit distance (ED) and Smith-Waterman (SW) problems. PSI is chosen to represent ECC point multiplication related computations, SH for bilinear pairing, and the last two for SFE-based dynamic programming (DP) problems. They represent different types of computations, so that in-depth understanding of the benefits and limitations of the GPU architecture for privacy preserving protocols is gained.

For SFE-based ED and SW problems, a wavefront parallel computing model on the CPU-GPU architecture under the semi-honest security model is proposed. Low level parallelization techniques for GPU-based gate (de-)garbler, synchronized parallel memory access, pipelining, and general GPU resource mapping policies are developed. This dissertation shows that the GPU architecture can be fully utilized to speed up SFE-based ED and SW algorithms, which are constructed with billions of garbled gates, on a contemporary GPU card GTX-680, with very little waste of processing cycles or memory space.

For PSI and SH protocols and underlying ECC algorithms, the analysis in this research shows that the conventional Montgomery-based number system is more friendly to the GPU architecture than the Residue Number System (RNS) is. Analysis on experiment results further shows that the lazy reduction in higher extension fields can have performance benefits only when the GPU architecture has enough fast memory. The resulting Elliptic curve Arithmetic GPU Library (EAGL) can run 3350.9 R-ate (bilinear) pairing/sec, and 47000 point multiplication/sec at the 128-bit security level, on one GTX-680 card. The primary performance bottleneck is found to be lacking of advanced memory management functions in the contemporary GPU architecture for bilinear pairing operations. Substantial performance gain can be expected when the on-chip memory size and/or more advanced memory prefetching mechanisms are supported in future generations of GPUs.