Atomic block formation for explicit data graph execution architectures

dc.contributor.advisorMcKinley, Kathryn S.en
dc.contributor.advisorBurger, Douglas C., Ph. D.en
dc.contributor.committeeMemberKeckler, Stephen W.en
dc.contributor.committeeMemberMahlke, Scott A.en
dc.contributor.committeeMemberPingali, Keshaven
dc.creatorMaher, Bertrand Allenen
dc.date.accessioned2010-12-13T19:02:10Zen
dc.date.accessioned2010-12-13T19:02:16Zen
dc.date.accessioned2017-05-11T22:20:52Z
dc.date.available2010-12-13T19:02:10Zen
dc.date.available2010-12-13T19:02:16Zen
dc.date.available2017-05-11T22:20:52Z
dc.date.issued2010-08en
dc.date.submittedAugust 2010en
dc.date.updated2010-12-13T19:02:16Zen
dc.descriptiontexten
dc.description.abstractLimits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations. This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture. For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads. This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency. We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively.en
dc.description.departmentComputer Sciencesen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2010-08-1904en
dc.language.isoengen
dc.subjectComputer architectureen
dc.subjectCompilersen
dc.subjectBlock formationen
dc.titleAtomic block formation for explicit data graph execution architecturesen
dc.type.genrethesisen

Files