Atomic block formation for explicit data graph execution architectures

Maher, Bertrand Allen

Atomic block formation for explicit data graph execution architectures

dc.contributor.advisor	McKinley, Kathryn S.	en
dc.contributor.advisor	Burger, Douglas C., Ph. D.	en
dc.contributor.committeeMember	Keckler, Stephen W.	en
dc.contributor.committeeMember	Mahlke, Scott A.	en
dc.contributor.committeeMember	Pingali, Keshav	en
dc.creator	Maher, Bertrand Allen	en
dc.date.accessioned	2010-12-13T19:02:10Z	en
dc.date.accessioned	2010-12-13T19:02:16Z	en
dc.date.accessioned	2017-05-11T22:20:52Z
dc.date.available	2010-12-13T19:02:10Z	en
dc.date.available	2010-12-13T19:02:16Z	en
dc.date.available	2017-05-11T22:20:52Z
dc.date.issued	2010-08	en
dc.date.submitted	August 2010	en
dc.date.updated	2010-12-13T19:02:16Z	en
dc.description	text	en
dc.description.abstract	Limits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations. This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture. For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads. This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency. We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively.	en
dc.description.department	Computer Sciences	en
dc.format.mimetype	application/pdf	en
dc.identifier.uri	http://hdl.handle.net/2152/ETD-UT-2010-08-1904	en
dc.language.iso	eng	en
dc.subject	Computer architecture	en
dc.subject	Compilers	en
dc.subject	Block formation	en
dc.title	Atomic block formation for explicit data graph execution architectures	en
dc.type.genre	thesis	en

Collections

University of Texas at Austin

Atomic block formation for explicit data graph execution architectures

Files

Collections