Self-tuning dynamic voltage scaling techniques for processor design
Abstract
The Dynamic Voltage Scaling (DVS) technique has proven to be ideal in regard to balancing performance and energy consumption of a processor since it allows for almost cubic reduction in dynamic power consumption with only a nearly linear reduction in performance. Due to its virtue, the DVS technique has been used for the two main purposes: energy-saving and temperature reduction. However, recently, a Dynamic Voltage Scaled (DVS) processor has lost its appeal as process technology advances due to the increasing Process, Voltage and Temperature (PVT) variations. In order to make a processor tolerant to the increasing uncertainties caused by such variations, processor designers have used more timing margins. Therefore, in a modern-day DVS processor, reducing voltage requires comparatively more performance degradation when compared to its predecessors. For this reason, this technique has a lot of room for improvement for the following facts. (a) From an energy-saving viewpoint, excessive margins to account for the worst-case operating conditions in a DVS processor can be exploited because they are rarely used during run-time. (b) From a temperature reduction point of view, accurate prediction of the optimal performance point in a DVS processor can increase its performance. In this dissertation, we propose four performance improvement ideas from two different uses of the DVS technique.
In regard to the DVS technique for energy-saving, in this dissertation, we introduce three different types of margin reduction (or margin decision) techniques. First, we introduce a new indirect Critical Path Monitor (CPM) to make a conventional DVS processor adaptive to its given environment. Our CPM is composed of several Slope Generators, each of which generates similar voltage scaling slopes to those of potential critical paths under a process corner. Each CPR in the Slope Generator tracks the delays of potential critical paths with minimum difference at any condition in a certain voltage range. The CPRs in the same Slope Generator are connected to a multiplexer and one of them is selected according to a current voltage level. Calibration steps are done by using conventional speed-binning process with clock duty-cycle modulation.
Second, we propose a new direct CPM that is based on a non-speculative pre-sampling technique. A processor that is based on this technique predicts timing errors in the actual critical paths and undertakes preventive steps in order to avoid the timing errors in the event that the timing margins fall below a critical level. Unlike direct CPM that uses circuit-level speculative operation, although the shadow latch can have timing error, the main Flip-Flop (FF) of our direct CPM never fails, guaranteeing always-correct operation of the processor. Our non-speculative CPM is more suitable for high-performance processor designs than the speculative CPM in that it does not require original design modification and has lower power overhead.
Third, we introduce a novel method that determines the most accurate margin that is based on the conventional binning process. By reusing the hold-scan FFs in a processor, we reduce design complexity, minimize hardware overhead and increase error detecting accuracy. Running workloads on the processor with Stop-Go clock gating allows us to find which paths have timing errors during the speed binning steps at various, fixed temperature levels. From this timing error information, we can determine the different maximum frequencies for diverse operating conditions. This method has high degree of accuracy without having a large overhead.
In regard to the DVS technique for temperature reduction, we introduce a run-time temperature monitoring scheme that predicts the optimal performance point in a DVS processor with high accuracy. In order to increase the accuracy of the optimal performance point prediction, this technique monitors the thermal stress of a processor during run-time and uses several Look-Up Tables (LUTs) for different process corners. The monitoring is performed while applying Stop-Go clock gating, and the average EN value is calculated at the end of the monitoring time. Prediction of the optimal performance point is made using the average EN value and one of the LUTs that corresponds to the process corner under which the processor was manufactured. The simulation results show that we can achieve maximum processor performance while keeping the processor temperature within threshold temperature.