Show simple item record

dc.contributor.advisorPatt, Yale N.en
dc.creatorTseng, Francis, 1976-en
dc.description.abstractThere is still much performance to be gained by out-of-order processors with wider issue widths. However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements. This dissertation introduces the braid, a compile-time generated entity that enables the execution core to scale to wider widths by exploiting the small fanout and short lifetime of values produced by the program. A braid captures dataflow and register usage information of the program which are known to the compiler but are not traditionally conveyed to the microarchitecture through the instruction set architecture. Braid processing requires identification by the compiler, minor augmentations to the instruction set architecture, and support by the microarchitecture. The execution core of the braid microarchitecture consists of a number of braid execution units (BEUs). The BEU is tailored to efficiently carry out the execution of a braid in an in-order fashion. Each BEU consists of a FIFO scheduler, a busy-bit vector, two functional units, and a small internal register file. The braid microarchitecture provides a number of opportunities for the reduction of design complexity. It reduces the port requirements of the renaming mechanism, it simplifies the steering process, it reduces the area, size, and port requirements of the register file, and it reduces the paths and port requirements of the bypass network. The complexity savings result in a design characterized by a lower power requirement, a shorter pipeline, and a higher clock frequency. On an 8-wide design, the result from executing braids is performance within 9% of a very aggressive conventional out-of-order microarchitecture with the complexity of an in-order implementation. Three bottlenecks are identified in the braid microarchitecture and a solution is presented to address each. The limitation on braid size is addressed by dynamic merging. The underutilization of braid execution resources caused by long-latency instructions is addressed by context sharing. The poor utilization of braid execution resources caused by single-instruction braids is addressed by heterogeneous execution resources.
dc.rightsCopyright © is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.en
dc.subject.lcshComputer architectureen
dc.subject.lcshCompilers (Computer programs)en
dc.titleBraids: out-of-order performance with almost in-order complexityen
dc.title.alternativeOut-of-order performance with almost in-order complexityen
dc.description.departmentElectrical and Computer Engineeringen

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record