Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation

dc.contributor.advisorVan de Geijn, Robert A.en
dc.contributor.advisorKolda, Tamara G.en
dc.contributor.committeeMemberStanton, John Fen
dc.contributor.committeeMemberPingali, Keshaven
dc.contributor.committeeMemberHammond, Jeff Ren
dc.contributor.committeeMemberBatory, Don Sen
dc.creatorSchatz, Martin Danielen
dc.creator.orcid0000-0002-6059-0490en
dc.date.accessioned2016-02-11T19:02:10Zen
dc.date.accessioned2018-01-22T22:29:30Z
dc.date.available2016-02-11T19:02:10Zen
dc.date.available2018-01-22T22:29:30Z
dc.date.issued2015-12en
dc.date.submittedDecember 2015en
dc.date.updated2016-02-11T19:02:10Zen
dc.description.abstractA goal of computer science is to develop practical methods to automate tasks that are otherwise too complex or tedious to perform manually. Complex tasks can include determining a practical algorithm and creating the associated implementation for a given problem specification. Goal-oriented programming can make this systematic. Therefore, we can rely on automated tools to create implementations by expressing the task of creating implementations in terms of goal-oriented programming. To do so, pertinent knowledge must be encoded which requires a notation and language to define relevant abstractions. This dissertation focuses on distributed-memory parallel tensor computations arising from computational chemistry. Specifically, we focus on applications based on the tensor contraction operation of dense, non-symmetric tensors. Creating an efficient algorithm for a given problem specification in this domain is complex; creating an optimized implementation of a developed algorithm is even more complex, tedious, and error-prone. To this end, we encode pertinent knowledge for distributed-memory parallel algorithms for tensor contractions of dense non-symmetric tensors. We do this by developing a notation for data distribution and redistribution that exposes a systematic procedure for deriving a family of algorithms for this operation for which efficient implementations exist. We validate the developed ideas by implementing them in the Redistribution Operations and Tensor Expressions application programming interface (ROTE API) and encoding them into an automated system, DxTer, for systematically generating efficient implementations from problem specifications. Experiments performed on the IBM Blue Gene/Q and Cray XC30 architectures testing generated implementations for the spin-adapted coupled cluster singles and doubles method from computational chemistry demonstrate impact both in terms of performance and storage requirements.en
dc.description.departmentComputer Sciencesen
dc.format.mimetypeapplication/pdfen
dc.identifierdoi:10.15781/T2XX0Ren
dc.identifier.urihttp://hdl.handle.net/2152/33276en
dc.language.isoenen
dc.subjectHigh-performance computingen
dc.subjectParallel programmingen
dc.subjectTensor computationsen
dc.titleDistributed tensor computations: formalizing distributions, redistributions, and algorithm derivationen
dc.typeThesisen

Files