Bottleneck identification and acceleration in multithreaded applications
When parallel applications do not fully utilize the cores that are available to them they are missing the opportunity to have better performance. Sometimes threads have to wait for other threads. I call the code segments that make other threads wait bottlenecks. Examples of these bottlenecks include contended critical sections, threads arriving late to barriers and the slowest stage of a pipelined program. Other times all threads are running but some of them, which I call lagging threads, are making less progress, setting the stage to become bottlenecks. My thesis proposes identifying the code segments that are more critical for performance and efficiently accelerating them using faster cores, by either migrating execution to large cores of an Asymmetric Chip Multi-Processor (ACMP) or executing locally on DVFS-accelerated cores. The key contribution of this dissertation is a Utility of Acceleration metric that combines a measure of the acceleration for each code segment with a measure of its criticality. This metric enables meaningful comparisons to decide which bottlenecks or lagging threads to accelerate with each of the available acceleration mechanisms. My evaluation shows significant performance improvement for single multithreaded applications and sets of multiple single- and multi-threaded applications, and also reduction in energy-delay product due to the efficient utilization of the available acceleration mechanisms.